ElementTree로 xml파일 읽어오기 (1)

파이썬3 노트

ElementTree로 xml파일 읽어오기 (1)

Jonchann 2019. 5. 7. 17:14

SemEval의 훈련 데이터를 다운로드 했더니 xml파일이었다.

찾아보니 Python의 import xml.etree.ElementTree as ET로 xml파일을 처리할 수 있었다.

도큐멘트 -> https://docs.python.org/ja/3/library/xml.etree.elementtree.html

참고로 SemEval의 훈련 데이터 xml파일을 열어보면 아래와 같이 되어있다.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Reviews>
    <Review rid="79">
        <sentences>
            <sentence id="79:0">
                <text>Being a PC user my whole life....</text>
            </sentence>
            <sentence id="79:1">
                <text>This computer is absolutely AMAZING!!!</text>
                <Opinions>
                    <Opinion category="LAPTOP#GENERAL" polarity="positive"/>
                </Opinions>

도큐멘트에 나와있는 첫 예시처럼 for문을 사용하면 SemEval 같은 경우,

for child in root:
     print(child.tag, child.attrib)

country {'name': 'Liechtenstein'}
country {'name': 'Singapore'}
country {'name': 'Panama'}

child.tag => Review

child.attrib => {'rid': 79}

라고 나온다.

하지만 내가 필요한 것은 이것이 아니라 Reviews 속 Review 속 sentences 속 sentence 속에 있는 text와 Opinions이다. 그럼 이것들을 위해서 for문으로 엄청 들어가면 되겠네?

for review in root:
	for sentences in review:
    	for sentence in sentences:
        	for content in sentence:
				if content.tag == "text":
                	print(content.text)   
                else:
                	for opinion in content:
                    	print(opinion.tag, opinion.attrib)

이렇게 보기 싫은 코드는 또 없을 것 같다. list comprehension으로 쓰면 좀 나아질까?

review = [cont.text for rev in root for sents in rev for sent in sents for cont in sent if cont.tag == "text"]

이것도 쓸 게 못 된다.

for문을 사용하지 않기 위해서는 find를 사용하면 된다.

sentence = root.find("Review").find("senteces").find("sentence")

이 한 줄로 sentence까지 바로 접근할 수 있다.

하지만 이 한 줄로는 가장 첫 번째 텍스트인 Being a PC user my whole life...만 출력할 수 있다.

그러니 적당히 for문과 섞어 써야 할 필요는 있다.

저작자표시 비영리 변경금지

'파이썬3 노트' 카테고리의 다른 글

ElementTree로 xml파일 읽어오기 (2) (0)	2019.07.08
allennlp의 elmo.md 알고 싶은 부분만 적당히 직역 (0)	2019.05.12
PyCharm으로 서버에 직접 파일 보내기 (0)	2019.04.19
torchtext로 전처리하기 (1) name, dirname, urls, cls.download() (1)	2019.04.19
python library의 upgrade를 pip으로 내놓지 않아 github의 master로 설치해야 할 때 (0)	2019.03.29

현재글ElementTree로 xml파일 읽어오기 (1)

비전공자가 정보영역 대학원에 들어와서 이대로는 안되겠다 싶어 개설한 블로그. 정보전달이 절대 목적이 아닌 필기용 블로그임에 주의.

이미 push한 commit 고치기, AWS Elastic Container Registry, 파이썬3 공부, AWS IAM 정책 역할, AWS Lambda, github, numpy basics, servletmodule, AWS CDK 에러, Web API, numpy 쓰면 안되는 문제, URI, AWS설명서, git with command, docker, git, github소스읽기, AWSLambda실행기, AWS ECR, TypeScript,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

MAGICPIE