从python中的xml文件解析数据

时间:2022-12-01 11:09:34

I have a xml file:

我有一个xml文件:

<swissprot created="2010-12-20">
 <entrylevel dataset="abc">
    <references id="1">
        <title>first references</title>
        <author>
            <person name="Mr. A"/>
            <person name="Mr. B"/>
            <person name="Mr. C"/>
        </author>
        <score> score1 for id 1 </score>
        <score> score2 for id 1 </score>
        <score> score3 for id 1 </score>
    </references>
    <references id="2">
        <title>Second references</title>
        <author>
            <person name="Mr. D"/>
            <person name="Mr. E"/>
            <person name="Mr. F"/>
        </author>
        <score> score1 for id 2 </score>
        <score> score2 for id 2 </score>
        <score> score3 for id 2 </score>
    </references>
    <references id="3">
        <title>third references</title>
        <author>
            <person name="Mr. G"/>
            <person name="Mr. H"/>
            <person name="Mr. I"/>
        </author>
        <score> score1 for id 3 </score>
        <score> score2 for id 3 </score>
        <score> score3 for id 3 </score>
    </references>
    <references id="4">
        <title>fourth references</title>
        <author>
            <person name="Mr. J"/>
            <person name="Mr. K"/>
            <person name="Mr. L"/>
        </author>
        <score> score 1 for id 4 </score>
        <score> score 2 for id 4 </score>
        <score> score 3 for id 4 </score>
    </references>
  </entrylevel>
</swissprot>  

I want the all references from this xml in a specific format: Output:

我希望以特定格式来自此xml的所有引用:输出:

First Reference
Mr A, Mr B, Mr C
score 1 for id 1, score 2 for id 1, score 3 for id 1

Second Reference
Mr D, Mr E, Mr F
score 1 for id 2, score 2 for id 2, score 3 for id 2

Third Reference
Mr G, Mr H, Mr I
score 1 for id 3, score 2 for id 3, score 3 for id 3

Fourth Reference
Mr J, Mr K, Mr L
score 1 for id 4, score 2 for id 4, score 3 for id 4

I wrote my code and I am able to get the value of title in correct format but I am not able to get the author information specifically for each entry.

我编写了我的代码,我能够以正确的格式获取标题的值,但我无法专门为每个条目获取作者信息。

import xml.etree.ElementTree as ET
document = ET.parse("recipe.xml")
root = document.getroot()
title=[]
author=[]
score=[]  

for i in root.getiterator('title'):
     title.append(i.text)
     for j in root.getiterator('author'):
          author.append(j.text)
           for k in root.getiterator('score'):
                score.append(k.text) 

for i,j,k in zip(title,author,score):
      print i,j,k

1 个解决方案

#1


0  

Iterate over references, not title.

迭代引用,而不是标题。

Modify following code as per your need!

根据您的需要修改以下代码!

import xml.etree.ElementTree as ET
document = ET.parse("recipe.xml")
root = document.getroot()
TITLE = 0

for child in root.getiterator('references'):
    author=[]
    score=[] 
    for k in child.getiterator('person'):
        author.append(k.get('name'))
    for l in child.getiterator('score'):
        score.append(l.text)

    print child[TITLE].text
    print ', '.join(author)
    print ', '.join(score)

Output:

first references
Mr. A, Mr. B, Mr. C
 score1 for id 1 ,  score2 for id 1 ,  score3 for id 1 
Second references
Mr. D, Mr. E, Mr. F
 score1 for id 2 ,  score2 for id 2 ,  score3 for id 2 
third references
Mr. G, Mr. H, Mr. I
 score1 for id 3 ,  score2 for id 3 ,  score3 for id 3 
fourth references
Mr. J, Mr. K, Mr. L
 score 1 for id 4 ,  score 2 for id 4 ,  score 3 for id 4

Read Parsing XML

阅读解析XML

#1


0  

Iterate over references, not title.

迭代引用,而不是标题。

Modify following code as per your need!

根据您的需要修改以下代码!

import xml.etree.ElementTree as ET
document = ET.parse("recipe.xml")
root = document.getroot()
TITLE = 0

for child in root.getiterator('references'):
    author=[]
    score=[] 
    for k in child.getiterator('person'):
        author.append(k.get('name'))
    for l in child.getiterator('score'):
        score.append(l.text)

    print child[TITLE].text
    print ', '.join(author)
    print ', '.join(score)

Output:

first references
Mr. A, Mr. B, Mr. C
 score1 for id 1 ,  score2 for id 1 ,  score3 for id 1 
Second references
Mr. D, Mr. E, Mr. F
 score1 for id 2 ,  score2 for id 2 ,  score3 for id 2 
third references
Mr. G, Mr. H, Mr. I
 score1 for id 3 ,  score2 for id 3 ,  score3 for id 3 
fourth references
Mr. J, Mr. K, Mr. L
 score 1 for id 4 ,  score 2 for id 4 ,  score 3 for id 4

Read Parsing XML

阅读解析XML