当前缀分隔符和后缀分隔符不同时,分割字符串的最佳方式是什么?

时间:2022-12-22 21:41:08

In Java, what is the best way to split a string into an array of blocks, when the delimiters at the beginning of each block are different from the delimiters at the end of each block?

在Java中,当每个块开头的分隔符与每个块末尾的分隔符不同时,将字符串分割成块数组的最佳方式是什么?

For example, suppose I have String string = "abc 1234 xyz abc 5678 xyz".

例如,假设我有字符串=“abc 1234 xyz 5678 xyz”。

I want to apply some sort of complex split in order to obtain {"1234","5678"}.

我想应用某种复杂的分割来获得{“1234”,“5678”}。

The first thing that comes to mind is:

首先想到的是:

String[] parts = string.split("abc");
for (String part : parts)
{
    String[] blocks = part.split("xyz");
    String data = blocks[0];
    // Do some stuff with the 'data' string
}

Is there a simpler / cleaner / more efficient way of doing it?

有更简单/更干净/更有效的方法吗?

My purpose (as you've probably guessed) is to parse an XML document.

我的目的(您可能已经猜到了)是解析XML文档。

I want to split a given XML string into the Inner-XML blocks of a given tag.

我想将给定的XML字符串分割为给定标记的内部XML块。

For example:

例如:

String xml = "<tag>ABC</tag>White Spaces Only<tag>XYZ</tag>";
String[] blocks = Split(xml,"<tag>","</tag>"); // should be {"ABC","XYZ"}

How would you implement String[] Split(String str,String prefix,String suffix)?

如何实现String[] Split(String str,String前缀,String后缀)?

Thanks

谢谢

4 个解决方案

#1


1  

The best is to use one of the dedicated XML parsers. See this discussion about best XML parser for Java.

最好是使用专用的XML解析器之一。请参阅关于Java的最佳XML解析器的讨论。

I found this DOM XML parser example as a simple and good one.

我发现这个DOM XML解析器示例简单而优秀。

#2


1  

IMHO the best solution will be to parse the XML file, which is not a one line thing...

最好的解决方案是解析XML文件,这不是一行的事情……

Look here

看这里

Here you have sample code from another question on SO to parse the document and then move around with XPATH:

这里有另一个问题的示例代码,以便解析文档,然后使用XPATH:

String xml = "<resp><status>good</status><msg>hi</msg></resp>";

InputSource source = new InputSource(new StringReader(xml));

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(source);

XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();

String msg = xpath.evaluate("/resp/msg", document);
String status = xpath.evaluate("/resp/status", document);

System.out.println("msg=" + msg + ";" + "status=" + status);

Complete thread of this post here

这篇文章的全文

#3


1  

You can write a regular expression for this type of string…

How about something like \s*((^abc)|(xyz\s*abc)|(\s*xyz$))\s* which says abc at the beginning, or xyz at the end, or abc xyz in the middle (modulo some spaces)? This produces an empty value at the beginning, but aside from that, it seems like it'd do what you want.

如何像\ s *((^ abc)|(xyz \ s * abc)|(\ s * xyz $))\ s *美国广播公司说,一开始,或xyz最后,中间或abc xyz(模一些空间)?这在一开始会产生一个空值,但除此之外,它似乎会做你想做的事。

import java.util.Arrays;

public class RegexDelimitersExample {
    public static void main(String[] args) {
        final String string = "abc 1234 xyz abc 5678 xyz";
        final String pattern = "\\s*((^abc)|(xyz\\s*abc)|(\\s*xyz$))\\s*";
        final String[] parts_ = string.split( pattern );
        // parts_[0] is "", because there's nothing before ^abc,
        // so a copy of the rest of the array is what we want.
        final String[] parts = Arrays.copyOfRange( parts_, 1, parts_.length );
        System.out.println( Arrays.deepToString( parts ));
    }
}
[1234, 5678]

Depending on how you want to handle spaces, you could adjust this as necessary. E.g.,

根据您希望如何处理空格,您可以根据需要进行调整。例如,

\s*((^abc)|(xyz\s*abc)|(\s*xyz$))\s*     # original
(^abc\s*)|(\s*xyz\s*abc\s*)|(\s*xyz$)    # no spaces on outside
...                                      # ...

…but you shouldn't use it for XML.

As I noted in the comments, though, this will work for splitting a non-nested string that has these sorts of delimiters. You won't be able to handle nested cases (e.g., abc abc 12345 xyz xyz) using regular expressions, so you won't be able to handle general XML (which seemed to be your intent). If you actually need to parse XML, use a tool designed for XML (e.g., a parser, an XPath query, etc.).

但是,正如我在评论中指出的那样,这将用于拆分具有这些分隔符的非嵌套字符串。您无法使用正则表达式处理嵌套的情况(例如,abc 12345 xyz xyz),因此您无法处理一般的XML(这似乎是您的意图)。如果您确实需要解析XML,请使用为XML设计的工具(例如,解析器、XPath查询等)。

#4


1  

Don't use regexes here. But you don't have to do full-fledged XML parsing either. Use XPath. The expression to search for in your example would be

不要使用regex。但您也不必进行全面的XML解析。使用XPath。在示例中要搜索的表达式将是

//tag/text()

The code needed is:

所需的代码是:

import org.w3c.dom.NodeList;
import org.xml.sax.*;
import javax.xml.xpath.*;

public class Test {

    public static void main(String[] args) throws Exception {

        InputSource ins = new InputSource("c:/users/ndh/hellos.xml");
        XPath xpath = XPathFactory.newInstance().newXPath();
        NodeList list = (NodeList)xpath.evaluate("//bar/text()", ins, XPathConstants.NODESET);
        for (int i = 0; i < list.getLength(); i++) {
            System.out.println(list.item(i).getNodeValue());
        }

    }
}

where my example xml file is

我的示例xml文件在哪里?

<?xml version="1.0"?>
<foo>
    <bar>hello</bar>
    <bar>ohayoo</bar>
    <bar>hola</bar>
</foo>

This is the most declarative way to do it.

这是最声明性的方法。

#1


1  

The best is to use one of the dedicated XML parsers. See this discussion about best XML parser for Java.

最好是使用专用的XML解析器之一。请参阅关于Java的最佳XML解析器的讨论。

I found this DOM XML parser example as a simple and good one.

我发现这个DOM XML解析器示例简单而优秀。

#2


1  

IMHO the best solution will be to parse the XML file, which is not a one line thing...

最好的解决方案是解析XML文件,这不是一行的事情……

Look here

看这里

Here you have sample code from another question on SO to parse the document and then move around with XPATH:

这里有另一个问题的示例代码,以便解析文档,然后使用XPATH:

String xml = "<resp><status>good</status><msg>hi</msg></resp>";

InputSource source = new InputSource(new StringReader(xml));

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(source);

XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();

String msg = xpath.evaluate("/resp/msg", document);
String status = xpath.evaluate("/resp/status", document);

System.out.println("msg=" + msg + ";" + "status=" + status);

Complete thread of this post here

这篇文章的全文

#3


1  

You can write a regular expression for this type of string…

How about something like \s*((^abc)|(xyz\s*abc)|(\s*xyz$))\s* which says abc at the beginning, or xyz at the end, or abc xyz in the middle (modulo some spaces)? This produces an empty value at the beginning, but aside from that, it seems like it'd do what you want.

如何像\ s *((^ abc)|(xyz \ s * abc)|(\ s * xyz $))\ s *美国广播公司说,一开始,或xyz最后,中间或abc xyz(模一些空间)?这在一开始会产生一个空值,但除此之外,它似乎会做你想做的事。

import java.util.Arrays;

public class RegexDelimitersExample {
    public static void main(String[] args) {
        final String string = "abc 1234 xyz abc 5678 xyz";
        final String pattern = "\\s*((^abc)|(xyz\\s*abc)|(\\s*xyz$))\\s*";
        final String[] parts_ = string.split( pattern );
        // parts_[0] is "", because there's nothing before ^abc,
        // so a copy of the rest of the array is what we want.
        final String[] parts = Arrays.copyOfRange( parts_, 1, parts_.length );
        System.out.println( Arrays.deepToString( parts ));
    }
}
[1234, 5678]

Depending on how you want to handle spaces, you could adjust this as necessary. E.g.,

根据您希望如何处理空格,您可以根据需要进行调整。例如,

\s*((^abc)|(xyz\s*abc)|(\s*xyz$))\s*     # original
(^abc\s*)|(\s*xyz\s*abc\s*)|(\s*xyz$)    # no spaces on outside
...                                      # ...

…but you shouldn't use it for XML.

As I noted in the comments, though, this will work for splitting a non-nested string that has these sorts of delimiters. You won't be able to handle nested cases (e.g., abc abc 12345 xyz xyz) using regular expressions, so you won't be able to handle general XML (which seemed to be your intent). If you actually need to parse XML, use a tool designed for XML (e.g., a parser, an XPath query, etc.).

但是,正如我在评论中指出的那样,这将用于拆分具有这些分隔符的非嵌套字符串。您无法使用正则表达式处理嵌套的情况(例如,abc 12345 xyz xyz),因此您无法处理一般的XML(这似乎是您的意图)。如果您确实需要解析XML,请使用为XML设计的工具(例如,解析器、XPath查询等)。

#4


1  

Don't use regexes here. But you don't have to do full-fledged XML parsing either. Use XPath. The expression to search for in your example would be

不要使用regex。但您也不必进行全面的XML解析。使用XPath。在示例中要搜索的表达式将是

//tag/text()

The code needed is:

所需的代码是:

import org.w3c.dom.NodeList;
import org.xml.sax.*;
import javax.xml.xpath.*;

public class Test {

    public static void main(String[] args) throws Exception {

        InputSource ins = new InputSource("c:/users/ndh/hellos.xml");
        XPath xpath = XPathFactory.newInstance().newXPath();
        NodeList list = (NodeList)xpath.evaluate("//bar/text()", ins, XPathConstants.NODESET);
        for (int i = 0; i < list.getLength(); i++) {
            System.out.println(list.item(i).getNodeValue());
        }

    }
}

where my example xml file is

我的示例xml文件在哪里?

<?xml version="1.0"?>
<foo>
    <bar>hello</bar>
    <bar>ohayoo</bar>
    <bar>hola</bar>
</foo>

This is the most declarative way to do it.

这是最声明性的方法。