使用Java解析XML文件并在文件路径中使用空格

时间:2022-12-01 13:57:30

I have files on my file system, on Windows XP. I want to parse them using Java (JRE 1.6).

我的文件系统上有文件,在Windows XP上。我想用Java解析它们(JRE 1.6)。

Problem is, I don't understand how Java and Xerces work together when the file path has spaces in it.

问题是,当文件路径中有空格时,我不明白Java和Xerces是如何协同工作的。

If the file has no spaces in its path, all works fine.

如果文件的路径中没有空格,则一切正常。

If there are spaces, I may have this kind of trouble, even if I call the parser with a FileInputStream instance :

如果有空格,即使我用FileInputStream实例调用解析器,我也可能遇到这种麻烦:

java.net.UnknownHostException: .
    at java.net.PlainSocketImpl.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at sun.net.NetworkClient.doConnect(Unknown Source)
    at sun.net.NetworkClient.openServer(Unknown Source)
    at sun.net.ftp.FtpClient.openServer(Unknown Source)
    at sun.net.ftp.FtpClient.openServer(Unknown Source)
    at sun.net.www.protocol.ftp.FtpURLConnection.connect(Unknown Source)
    at sun.net.www.protocol.ftp.FtpURLConnection.getInputStream(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)

(sun.net.ftp.FtpClient.openServer ??? Wtf ?)

(sun.net.ftp.FtpClient.openServer ??? Wtf?)

or else this kind of trouble :

或者这种麻烦:

java.net.MalformedURLException: unknown protocol: d
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)

(It says unknown protocol: d because, I guess, the file is on the D drive.)

(它说未知协议:d因为,我猜,文件在D盘上。)

Has anyone any clue of why that happens, and how to circumvent the problem ? I tried to supply my own EntityResolver but my log tells me it is not even called before the crash.

有没有人知道为什么会发生这种情况,以及如何规避问题?我试图提供我自己的EntityResolver,但是我的日志告诉我在崩溃之前甚至没有调用它。


EDIT:

Here is the code calling the parser.

这是调用解析器的代码。

public Document fileToDom(File file) throws ProcessException {
    Document doc = null;
    try {
        DocumentBuilderFactory db = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = db.newDocumentBuilder();
        if (this.errorHandler!=null){
            builder.setErrorHandler(this.errorHandler);}
        else {
            builder.setErrorHandler(new DefaultHandler());
        }
        FileInputStream test= new FileInputStream(file);
        doc = builder.parse(test);
        ...
    } catch (Exception e) {...}
    ...
}

For the moment I find myself forced to remove the DOCTYPE before the parse, which removes all the problems, and the DTD validation... Not so great a solution.

目前我发现自己*在解析之前移除了DOCTYPE,这消除了所有问题,并且DTD验证......不是那么好的解决方案。

4 个解决方案

#1


Are you just using DocumentBuilder.parse(filename)?

你刚刚使用DocumentBuilder.parse(filename)吗?

If so, that's failing because it expects a URI. Open a FileInputStream to the file, and then pass that to DocumentBuilder.parse(InputStream).

如果是这样,那就失败了,因为它需要一个URI。打开文件的FileInputStream,然后将其传递给DocumentBuilder.parse(InputStream)。

#2


Try this URI style:

试试这种URI样式:

file:///d:/folder/folder%20with%20space/file.xml

#3


It looks like it's trying to connect to a URL in the doctype header so it can download it in order to validate the document against the downloaded DTD.

看起来它正在尝试连接到doctype标头中的URL,因此可以下载它以便针对下载的DTD验证文档。

#4


Try this.

InputSource is = new InputSource();
is.setCharacterStream(new StringReader(test));
doc = builder.parse(is);

instead of just parsing the 'test'

而不只是解析'测试'

#1


Are you just using DocumentBuilder.parse(filename)?

你刚刚使用DocumentBuilder.parse(filename)吗?

If so, that's failing because it expects a URI. Open a FileInputStream to the file, and then pass that to DocumentBuilder.parse(InputStream).

如果是这样,那就失败了,因为它需要一个URI。打开文件的FileInputStream,然后将其传递给DocumentBuilder.parse(InputStream)。

#2


Try this URI style:

试试这种URI样式:

file:///d:/folder/folder%20with%20space/file.xml

#3


It looks like it's trying to connect to a URL in the doctype header so it can download it in order to validate the document against the downloaded DTD.

看起来它正在尝试连接到doctype标头中的URL,因此可以下载它以便针对下载的DTD验证文档。

#4


Try this.

InputSource is = new InputSource();
is.setCharacterStream(new StringReader(test));
doc = builder.parse(is);

instead of just parsing the 'test'

而不只是解析'测试'