Python在没有第三方库的情况下操作和保存XML

时间:2023-01-18 00:26:59

I have a xml in which I have to search for a tag and replace the value of tag with a new values. For example,

我有一个xml,在这个xml中,我必须搜索一个标签,并用一个新的值替换标签的值。例如,

<tag-Name>oldName</tag-Name>  

and replace oldName to newName like

并将旧名替换为newName like

<tag-Name>newName</tag-Name>    

and save the xml. How can I do that without using thirdparty lib like BeautifulSoup

并保存xml。我怎么能做到不使用像漂亮的汤这样的三党*

Thank you

谢谢你!

3 个解决方案

#1


5  

The best option from the standard lib is (I think) the xml.etree package.

标准库的最佳选项(我认为)是xml。etree包。

Assuming that your example tag occurs only once somewhere in the document:

假设示例标记只出现在文档中的某个地方一次:

import xml.etree.ElementTree as etree
# or for a faster C implementation
# import xml.etree.cElementTree as etree

tree = etree.parse('input.xml')
elem = tree.find('//tag-Name') # finds the first occurrence of element tag-Name
elem.text = 'newName'
tree.write('output.xml')

Or if there are multiple occurrences of tag-Name, and you want to change them all if they have "oldName" as content:

或者,如果有多个标记名出现,如果它们的内容是“oldName”,那么您希望将它们全部更改:

import xml.etree.cElementTree as etree

tree = etree.parse('input.xml')
for elem in tree.findall('//tag-Name'):
    if elem.text == 'oldName':
        elem.text = 'newName'
# some output options for example
tree.write('output.xml', encoding='utf-8', xml_declaration=True)

#2


0  

Python has 'builtin' libraries for working with xml. For this simple task I'd look into minidom. You can find the docs here:

Python有用于处理xml的“内置”库。对于这个简单的任务,我将研究minidom。你可以在这里找到医生:

http://docs.python.org/library/xml.dom.minidom.html

http://docs.python.org/library/xml.dom.minidom.html

#3


0  

If you are certain, I mean, completely 100% positive that the string <tag-Name> will never appear inside that tag and the XML will always be formatted like that, you can always use good old string manipulation tricks like:

如果您确信,我的意思是,100%肯定的是,字符串 将永远不会出现在该标记中,XML的格式将始终是这样的,您总是可以使用一些很好的旧的字符串处理技巧,比如:

xmlstring = xmlstring.replace('<tag-Name>oldName</tag-Name>', '<tag-Name>newName</tag-Name>')

If the XML isn't always as conveniently formatted as <tag>value</tag>, you could write something like:

如果XML的格式不总是像 value 那样方便,您可以这样写:

a = """<tag-Name>


oldName


    </tag-Name>"""

def replacetagvalue(xmlstring, tag, oldvalue, newvalue):
    start = 0
    while True:
        try:
            start = xmlstring.index('<%s>' % tag, start) + 2 + len(tag)
        except ValueError:
            break
        end = xmlstring.index('</%s>' % tag, start)
        value = xmlstring[start:end].strip()
        if value == oldvalue:
            xmlstring = xmlstring[:start] + newvalue + xmlstring[end:]
    return xmlstring

print replacetagvalue(a, 'tag-Name', 'oldName', 'newName')

Others have already mentioned xml.dom.minidom which is probably a better idea if you're not absolutely certain beyond any doubt that your XML will be so simple. If you can guarantee that, though, remember that XML is just a big chunk of text that you can manipulate as needed.

其他人已经提到了xml.dom。如果您不确定您的XML会如此简单,那么minidom可能是一个更好的主意。但是,如果您能保证这一点,请记住XML只是一大块文本,您可以根据需要进行操作。

I use similar tricks in some production code where simple checks like if "somevalue" in htmlpage are much faster and more understandable than invoking BeautifulSoup, lxml, or other *ML libraries.

在一些生产代码中,我使用了类似的技巧,比如在htmlpage中“somevalue”这样的简单检查比调用漂亮的soup、lxml或其他*ML库要快得多,也更容易理解。

#1


5  

The best option from the standard lib is (I think) the xml.etree package.

标准库的最佳选项(我认为)是xml。etree包。

Assuming that your example tag occurs only once somewhere in the document:

假设示例标记只出现在文档中的某个地方一次:

import xml.etree.ElementTree as etree
# or for a faster C implementation
# import xml.etree.cElementTree as etree

tree = etree.parse('input.xml')
elem = tree.find('//tag-Name') # finds the first occurrence of element tag-Name
elem.text = 'newName'
tree.write('output.xml')

Or if there are multiple occurrences of tag-Name, and you want to change them all if they have "oldName" as content:

或者,如果有多个标记名出现,如果它们的内容是“oldName”,那么您希望将它们全部更改:

import xml.etree.cElementTree as etree

tree = etree.parse('input.xml')
for elem in tree.findall('//tag-Name'):
    if elem.text == 'oldName':
        elem.text = 'newName'
# some output options for example
tree.write('output.xml', encoding='utf-8', xml_declaration=True)

#2


0  

Python has 'builtin' libraries for working with xml. For this simple task I'd look into minidom. You can find the docs here:

Python有用于处理xml的“内置”库。对于这个简单的任务,我将研究minidom。你可以在这里找到医生:

http://docs.python.org/library/xml.dom.minidom.html

http://docs.python.org/library/xml.dom.minidom.html

#3


0  

If you are certain, I mean, completely 100% positive that the string <tag-Name> will never appear inside that tag and the XML will always be formatted like that, you can always use good old string manipulation tricks like:

如果您确信,我的意思是,100%肯定的是,字符串 将永远不会出现在该标记中,XML的格式将始终是这样的,您总是可以使用一些很好的旧的字符串处理技巧,比如:

xmlstring = xmlstring.replace('<tag-Name>oldName</tag-Name>', '<tag-Name>newName</tag-Name>')

If the XML isn't always as conveniently formatted as <tag>value</tag>, you could write something like:

如果XML的格式不总是像 value 那样方便,您可以这样写:

a = """<tag-Name>


oldName


    </tag-Name>"""

def replacetagvalue(xmlstring, tag, oldvalue, newvalue):
    start = 0
    while True:
        try:
            start = xmlstring.index('<%s>' % tag, start) + 2 + len(tag)
        except ValueError:
            break
        end = xmlstring.index('</%s>' % tag, start)
        value = xmlstring[start:end].strip()
        if value == oldvalue:
            xmlstring = xmlstring[:start] + newvalue + xmlstring[end:]
    return xmlstring

print replacetagvalue(a, 'tag-Name', 'oldName', 'newName')

Others have already mentioned xml.dom.minidom which is probably a better idea if you're not absolutely certain beyond any doubt that your XML will be so simple. If you can guarantee that, though, remember that XML is just a big chunk of text that you can manipulate as needed.

其他人已经提到了xml.dom。如果您不确定您的XML会如此简单,那么minidom可能是一个更好的主意。但是,如果您能保证这一点,请记住XML只是一大块文本,您可以根据需要进行操作。

I use similar tricks in some production code where simple checks like if "somevalue" in htmlpage are much faster and more understandable than invoking BeautifulSoup, lxml, or other *ML libraries.

在一些生产代码中,我使用了类似的技巧,比如在htmlpage中“somevalue”这样的简单检查比调用漂亮的soup、lxml或其他*ML库要快得多,也更容易理解。