Python生成PASCAL VOC格式的xml标注文件

时间:2023-03-08 16:36:03
Python生成PASCAL VOC格式的xml标注文件

Python生成PASCAL VOC格式的xml标注文件

PASCAL VOC数据集的标注文件是xml格式的。对于py-faster-rcnn,通常以下示例的字段是合适的:

<annotation>
<folder>GTSDB</folder>
<filename>000001.jpg</filename>
<size>
<width>500</width>
<height>375</height>
<depth>3</depth>
</size>
<object>
<name>mouse</name>
<difficult>0</difficult>
<bndbox>
<xmin>99</xmin>
<ymin>358</ymin>
<xmax>135</xmax>
<ymax>375</ymax>
</bndbox>
</object>
</annotation>

怎样从csv或者txt格式的文件,读取bbox信息,生成xml格式的annotation文件呢?直接逐行写文件肯定可以,但是以后改起来并不太方便,\t和空格的替换也不太方便。

xml.etree.ElementTree这个包似乎挺好用的,解析和生成xml都可以。但是会产生<?xml version="1.0" ?>这样一行头部版本信息。我们不需要这个信息。使用lxml包替代xml,可以去掉它。

下面给出了一个例子。

安装依赖项

sudo pip install lxml

生成xml示例代码

#!/usr/bin/env python
# coding:utf-8 #from xml.etree.ElementTree import Element, SubElement, tostring
from lxml.etree import Element, SubElement, tostring
import pprint
from xml.dom.minidom import parseString node_root = Element('annotation') node_folder = SubElement(node_root, 'folder')
node_folder.text = 'GTSDB' node_filename = SubElement(node_root, 'filename')
node_filename.text = '000001.jpg' node_size = SubElement(node_root, 'size')
node_width = SubElement(node_size, 'width')
node_width.text = '500' node_height = SubElement(node_size, 'height')
node_height.text = '375' node_depth = SubElement(node_size, 'depth')
node_depth.text = '3' node_object = SubElement(node_root, 'object')
node_name = SubElement(node_object, 'name')
node_name.text = 'mouse'
node_difficult = SubElement(node_object, 'difficult')
node_difficult.text = '0'
node_bndbox = SubElement(node_object, 'bndbox')
node_xmin = SubElement(node_bndbox, 'xmin')
node_xmin.text = '99'
node_ymin = SubElement(node_bndbox, 'ymin')
node_ymin.text = '358'
node_xmax = SubElement(node_bndbox, 'xmax')
node_xmax.text = '135'
node_ymax = SubElement(node_bndbox, 'ymax')
node_ymax.text = '375' xml = tostring(node_root, pretty_print=True) #格式化显示,该换行的换行
dom = parseString(xml)
print xml

用lxml库解析VOC2007的xml

from lxml import etree

class BndBox(object):
def __init__(self, x1=0, y1=0, x2=0, y2=0, cls=None):
self.x1 = x1
self.y1 = y1
self.x2 = x2
self.y2 = y2
self.cls_name = cls # class name def test_parsing(xml_pth):
xml_desc = etree.parse(xml_pth)
box = BndBox()
for obj in xml_desc.xpath('//object'):
for item in obj.getchildren():
if (item.tag=='name'):
box.cls_name = item.text
elif (item.tag=='bndbox'):
coords = [int(float(_.text)) for _ in item.getchildren()]
box.x1, box.y1, box.x2, box.y2 = coords
print(box.cls_name, box.x1, box.y1, box.x2, box.y2) if __name__ == '__main__':
#draw_labels('datasetTraffic')
test_parsing('H:/zz_dataset/datasetTraffic/Annotations/2012_004317.xml')