使用Java Dom Parser解析xml

时间:2021-12-18 01:00:00

I am new to Java and XML and i need to fetch some data from an xml file.

我是Java和XML的新手,我需要从xml文件中获取一些数据。

Here is my xml

这是我的xml

<?xml version="1.0" encoding="UTF-8"?>
<course name="BSc (Hons) Software Engineering" version="5.0" type="FT" lowerbound="2012" upperbound="2014" >
   <year id="1">
      <semester id="1">
         <module>
            <code>HCA1105C</code>
            <name>Computer Architecture</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>PROG1115C</code>
            <name>Object Oriented Software Development I</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>MATH1103C</code>
            <name>Decision Mathematics</name>
            <credits>3</credits>
            <hrs_per_wk>2+1</hrs_per_wk>
         </module>
         <module>
            <code>ITE1107C</code>
            <name>Language and Communication Seminar</name>
            <credits>3</credits>
            <hrs_per_wk>2+1</hrs_per_wk>
         </module>
         <module>
            <code>MGMT1101C</code>
            <name>Management Seminar</name>
            <credits>3</credits>
            <hrs_per_wk>2+1</hrs_per_wk>
         </module>
      </semester>
      <semester id="2">
         <module>
            <code>PROG1116C</code>
            <name>Object Oriented Software Development II</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>WAT1116C</code>
            <name>Internet Programming I</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>MATH1101C</code>
            <name>Analytic Methods for Computing</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>DBT1111C</code>
            <name>Database Design</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
      </semester>
   </year>
   <year id="2">
      <semester id="1">
         <module>
            <code>CAN2112C</code>
            <name>Network Design &amp; Programming</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>WAT2117C</code>
            <name>Internet Programming II</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>OSS2109C</code>
            <name>Operating Systems</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>PROG2117C</code>
            <name>Desktop Application Development</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
      </semester>
      <semester id="2">
         <module>
            <code>SDT2114C</code>
            <name>Requirements Engineering</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>MATH2323C</code>
            <name>Numerical Methods</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>MCT2104C</code>
            <name>Mobile Application Development</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>MCT2104C</code>
            <name>Mobile Application Development</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>WAT2124C</code>
            <name>Web Services</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>MGMT2104C</code>
            <name>Research &amp; Development Seminar</name>
            <credits>3</credits>
            <hrs_per_wk>2+1</hrs_per_wk>
         </module>
      </semester>
   </year>
   <year id="3">
      <semester id="1">
         <module>
            <code>SECU3119C</code>
            <name>Secure Software Development</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>MULT3114C</code>
            <name>Game Development</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>SEM3112C</code>
            <name>Project Management Seminar</name>
            <credits>3</credits>
            <hrs_per_wk>2+1</hrs_per_wk>
         </module>
      </semester>
      <semester id="2">
         <module>
            <code>SDT3104C</code>
            <name>Enterprise Software Development</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>WAT3125C</code>
            <name>Emerging Web Technologies</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>SEM3113C</code>
            <name>Software Quality Management</name>
            <credits>4</credits>
            <hrs_per_wk>2+2</hrs_per_wk>
         </module>
         <module>
            <code>MGMT3105C</code>
            <name>Entrepreneurship Seminar</name>
            <credits>3</credits>
            <hrs_per_wk>2+1</hrs_per_wk>
         </module>
         <module>
            <code>PROJ3105C</code>
            <name>Systems Development Project</name>
            <credits>9</credits>
            <hrs_per_wk />
         </module>
      </semester>
   </year>
</course>

Lets say that i want to get all modules code that are in semester 1 year 1.

让我们说我想获得学期1年1的所有模块代码。

HCA1105C
PROG1115C
MATH1103C
ITE1107C
MGMT1101C

Here is my code so far

到目前为止,这是我的代码

try {   
    File inputFile = new File(System.getProperty("user.dir") + "/courses/bse.xml");
        DocumentBuilderFactory dbFactory
                = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(inputFile);
        doc.getDocumentElement().normalize();
        NodeList nList = doc.getElementsByTagName("year");
        for (int i = 0; i < nList.getLength(); i++) {
            Node nNode = nList.item(i);
            if (nNode.getNodeType() == Node.ELEMENT_NODE) {
                Element eElement = (Element) nNode;
               //if (Integer.parseInt(eElement.getAttribute("id")) == 1 ) {
                   System.out.println(eElement.getElementsByTagName("code").item(0).getTextContent());
               //}
            }
        }
    } catch (Exception e) {
        JOptionPane.showMessageDialog(null, e.getMessage(), "Fatal Error", JOptionPane.ERROR_MESSAGE);
        System.exit(1);
    }

I get the following output

我得到以下输出

HCA1105C
CAN2112C
SECU3119C

3 个解决方案

#1


1  

Checking child nodes and dive into for modules will give your expected result as below;

检查子节点并深入了解模块将得到如下预期结果;

public static void main(String[] args) {
        try {
            File inputFile = new File("Snippet.xml");
            DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(inputFile);
            doc.getDocumentElement().normalize();
            NodeList nList = doc.getElementsByTagName("year");
            for (int i = 0; i < nList.getLength(); i++) {
                Node nNode = nList.item(i);
                if (nNode.getNodeType() == Node.ELEMENT_NODE) {
                    Element eElement = (Element) nNode;
                    if (Integer.parseInt(eElement.getAttribute("id")) == 1) { // Found year 1
                        NodeList semeterList = nNode.getChildNodes();
                        for (int j = 0; j < semeterList.getLength(); j++) {
                            nNode = semeterList.item(j);
                            if (nNode.getNodeType() == Node.ELEMENT_NODE) {
                                Element semesterNode = (Element) nNode;
                                if (Integer.parseInt(semesterNode.getAttribute("id")) == 1) { //Found semester 1
                                    NodeList moduleList = semesterNode.getChildNodes();
                                    for (int k = 0; k < moduleList.getLength(); k++) {
                                        nNode = moduleList.item(k);
                                        if (nNode.getNodeType() == Node.ELEMENT_NODE) {
                                            Element modeluNode = (Element) nNode;
                                            System.out.println(modeluNode.getElementsByTagName("code").item(0).getTextContent());
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
            JOptionPane.showMessageDialog(null, e.getMessage(), "Fatal Error", JOptionPane.ERROR_MESSAGE);
            System.exit(1);
        }
    }

#2


2  

Your code is reading 1st module of every year. This is because, the node list will have 3 nodes (year=1, year=2, year=3) for the criteria you have specified.

您的代码正在阅读每年的第一个模块。这是因为,节点列表将为您指定的条件提供3个节点(年= 1,年= 2,年= 3)。

If you want to read all modules of year 1, then you need to recurse in to the sub-section of the document with year="1". Then you will get nodelist of semesters. You need to further recurse in to children of semester=1.

如果要读取第1年的所有模块,则需要使用year =“1”递归到文档的子部分。然后你会得到学期的节点列表。您需要进一步递归到学期= 1的孩子。

You may try using query with xpath, where you can get the modules of year=1 and semester=1 directly.

您可以尝试使用xpath查询,您可以直接获取year = 1和semester = 1的模块。

http://viralpatel.net/blogs/java-xml-xpath-tutorial-parse-xml/

EDITED with modified code using XPath:

使用XPath修改代码进行编辑:

try {   
    File inputFile = new File("courses.xml");
        DocumentBuilderFactory dbFactory
                = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(inputFile);
        doc.getDocumentElement().normalize();

        XPath xPath =  XPathFactory.newInstance().newXPath();
        String expression = "/course/year[@id=1]/semester[@id=1]/module/code";
        NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(doc, XPathConstants.NODESET);
        System.out.println(expression);
        for (int i = 0; i < nodeList.getLength(); i++) {
            System.out.println(nodeList.item(i).getTextContent()); 
        }
    } catch (Exception e) {
        JOptionPane.showMessageDialog(null, e.getMessage(), "Fatal Error", JOptionPane.ERROR_MESSAGE);
        System.exit(1);
    }

#3


0  

We can fetch all the codes through DOM, by using following code:

我们可以使用以下代码通过DOM获取所有代码:

try {   
        File inputFile = new File("src/resources/res.xml");
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(inputFile);
        doc.getDocumentElement().normalize();
        NodeList nList = doc.getElementsByTagName("module");
        for (int i = 0; i < nList.getLength(); i++) {
             Node nNode = nList.item(i);
             if (nNode.getNodeType() == Node.ELEMENT_NODE) {
                 Element eElement = (Element) nNode;
                 System.out.println(eElement.getElementsByTagName("code").item(0).getTextContent());
             }
        }
     } catch (Exception e) {
            JOptionPane.showMessageDialog(null, e.getMessage(), "Fatal Error", JOptionPane.ERROR_MESSAGE);
            System.exit(1);
      }

We can also, fetch code by looping each year --> semester --> module, then fetch the attribute code. Above code gives following results:

我们也可以通过循环每年获取代码 - > semester - > module,然后获取属性代码。以上代码给出以下结果:

HCA1105C PROG1115C MATH1103C ITE1107C MGMT1101C PROG1116C WAT1116C MATH1101C DBT1111C CAN2112C WAT2117C OSS2109C PROG2117C SDT2114C MATH2323C MCT2104C MCT2104C WAT2124C MGMT2104C SECU3119C MULT3114C SEM3112C SDT3104C WAT3125C SEM3113C MGMT3105C PROJ3105C

HCA1105C PROG1115C MATH1103C ITE1107C MGMT1101C PROG1116C WAT1116C MATH1101C DBT1111C CAN2112C WAT2117C OSS2109C PROG2117C SDT2114C MATH2323C MCT2104C MCT2104C WAT2124C MGMT2104C SECU3119C MULT3114C SEM3112C SDT3104C WAT3125C SEM3113C MGMT3105C PROJ3105C

#1


1  

Checking child nodes and dive into for modules will give your expected result as below;

检查子节点并深入了解模块将得到如下预期结果;

public static void main(String[] args) {
        try {
            File inputFile = new File("Snippet.xml");
            DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(inputFile);
            doc.getDocumentElement().normalize();
            NodeList nList = doc.getElementsByTagName("year");
            for (int i = 0; i < nList.getLength(); i++) {
                Node nNode = nList.item(i);
                if (nNode.getNodeType() == Node.ELEMENT_NODE) {
                    Element eElement = (Element) nNode;
                    if (Integer.parseInt(eElement.getAttribute("id")) == 1) { // Found year 1
                        NodeList semeterList = nNode.getChildNodes();
                        for (int j = 0; j < semeterList.getLength(); j++) {
                            nNode = semeterList.item(j);
                            if (nNode.getNodeType() == Node.ELEMENT_NODE) {
                                Element semesterNode = (Element) nNode;
                                if (Integer.parseInt(semesterNode.getAttribute("id")) == 1) { //Found semester 1
                                    NodeList moduleList = semesterNode.getChildNodes();
                                    for (int k = 0; k < moduleList.getLength(); k++) {
                                        nNode = moduleList.item(k);
                                        if (nNode.getNodeType() == Node.ELEMENT_NODE) {
                                            Element modeluNode = (Element) nNode;
                                            System.out.println(modeluNode.getElementsByTagName("code").item(0).getTextContent());
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
            JOptionPane.showMessageDialog(null, e.getMessage(), "Fatal Error", JOptionPane.ERROR_MESSAGE);
            System.exit(1);
        }
    }

#2


2  

Your code is reading 1st module of every year. This is because, the node list will have 3 nodes (year=1, year=2, year=3) for the criteria you have specified.

您的代码正在阅读每年的第一个模块。这是因为,节点列表将为您指定的条件提供3个节点(年= 1,年= 2,年= 3)。

If you want to read all modules of year 1, then you need to recurse in to the sub-section of the document with year="1". Then you will get nodelist of semesters. You need to further recurse in to children of semester=1.

如果要读取第1年的所有模块,则需要使用year =“1”递归到文档的子部分。然后你会得到学期的节点列表。您需要进一步递归到学期= 1的孩子。

You may try using query with xpath, where you can get the modules of year=1 and semester=1 directly.

您可以尝试使用xpath查询,您可以直接获取year = 1和semester = 1的模块。

http://viralpatel.net/blogs/java-xml-xpath-tutorial-parse-xml/

EDITED with modified code using XPath:

使用XPath修改代码进行编辑:

try {   
    File inputFile = new File("courses.xml");
        DocumentBuilderFactory dbFactory
                = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(inputFile);
        doc.getDocumentElement().normalize();

        XPath xPath =  XPathFactory.newInstance().newXPath();
        String expression = "/course/year[@id=1]/semester[@id=1]/module/code";
        NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(doc, XPathConstants.NODESET);
        System.out.println(expression);
        for (int i = 0; i < nodeList.getLength(); i++) {
            System.out.println(nodeList.item(i).getTextContent()); 
        }
    } catch (Exception e) {
        JOptionPane.showMessageDialog(null, e.getMessage(), "Fatal Error", JOptionPane.ERROR_MESSAGE);
        System.exit(1);
    }

#3


0  

We can fetch all the codes through DOM, by using following code:

我们可以使用以下代码通过DOM获取所有代码:

try {   
        File inputFile = new File("src/resources/res.xml");
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(inputFile);
        doc.getDocumentElement().normalize();
        NodeList nList = doc.getElementsByTagName("module");
        for (int i = 0; i < nList.getLength(); i++) {
             Node nNode = nList.item(i);
             if (nNode.getNodeType() == Node.ELEMENT_NODE) {
                 Element eElement = (Element) nNode;
                 System.out.println(eElement.getElementsByTagName("code").item(0).getTextContent());
             }
        }
     } catch (Exception e) {
            JOptionPane.showMessageDialog(null, e.getMessage(), "Fatal Error", JOptionPane.ERROR_MESSAGE);
            System.exit(1);
      }

We can also, fetch code by looping each year --> semester --> module, then fetch the attribute code. Above code gives following results:

我们也可以通过循环每年获取代码 - > semester - > module,然后获取属性代码。以上代码给出以下结果:

HCA1105C PROG1115C MATH1103C ITE1107C MGMT1101C PROG1116C WAT1116C MATH1101C DBT1111C CAN2112C WAT2117C OSS2109C PROG2117C SDT2114C MATH2323C MCT2104C MCT2104C WAT2124C MGMT2104C SECU3119C MULT3114C SEM3112C SDT3104C WAT3125C SEM3113C MGMT3105C PROJ3105C

HCA1105C PROG1115C MATH1103C ITE1107C MGMT1101C PROG1116C WAT1116C MATH1101C DBT1111C CAN2112C WAT2117C OSS2109C PROG2117C SDT2114C MATH2323C MCT2104C MCT2104C WAT2124C MGMT2104C SECU3119C MULT3114C SEM3112C SDT3104C WAT3125C SEM3113C MGMT3105C PROJ3105C