正则表达式在标签内获取价值

时间:2022-10-27 11:04:46

I have a sample set of XML returned back:

我有一个返回的XML样本集:

<rsp stat="ok">
  <site>
    <id>1234</id>
    <name>testAddress</name>
    <hostname>anotherName</hostname>
    ...

  </site>
  <site>
    <id>56789</id>
    <name>ba</name>
    <hostname>alphatest</hostname>
    ...
  </site>
</rsp>

I want to extract everything within <name></name> but not the tags themselves, and to have that only for the first instance (or based on some other test select which item).

我想提取 中的所有内容,但不提取标签本身,并且仅针对第一个实例(或基于其他一些测试选择哪个项目)。

Is this possible with regex?

正则表达式可以实现吗?

5 个解决方案

#1


1  

The best tool for this kind of task is XPath.

这种任务的最佳工具是XPath。

NSURL *rspURL = [NSURL fileURLWithPath:[@"~/rsp.xml" stringByExpandingTildeInPath]];
NSXMLDocument *document = [[[NSXMLDocument alloc] initWithContentsOfURL:rspURL options:NSXMLNodeOptionsNone error:NULL] autorelease];

NSArray *nodes = [document nodesForXPath:@"/rsp/site[1]/name" error:NULL];
NSString *name = [nodes count] > 0 ? [[nodes objectAtIndex:0] stringValue] : nil;

If you want the name of the site which has id 56789, use this XPath: /rsp/site[id='56789']/name instead. I suggest you read W3Schools XPath tutorial for a quick overview of the XPath syntax.

如果您想要站点名称为id 56789,请使用此XPath:/ rsp / site [id ='56789'] / name。我建议您阅读W3Schools XPath教程,以快速了解XPath语法。

#2


3  

<disclaimer>I don't use Objective-C</disclaimer>

<免责声明> 我不使用Objective-C

You should be using an XML parser, not regexes. XML is not a regular language, hence not easely parseable by a regular expression. Don't do it.

您应该使用XML解析器,而不是正则表达式。 XML不是常规语言,因此不能通过正则表达式轻松解析。不要这样做。

Never use regular expressions or basic string parsing to process XML. Every language in common usage right now has perfectly good XML support. XML is a deceptively complex standard and it's unlikely your code will be correct in the sense that it will properly parse all well-formed XML input, and even it if does, you're wasting your time because (as just mentioned) every language in common usage has XML support. It is unprofessional to use regular expressions to parse XML.

切勿使用正则表达式或基本字符串解析来处理XML。现在常用的每种语言都有非常好的XML支持。 XML是一个看似复杂的标准,并且你的代码不太可能正确地解析所有格式良好的XML输入,即便如此,你也浪费你的时间,因为(如刚刚提到的)每种语言都是常见用法有XML支持。使用正则表达式解析XML是不专业的。

You could use Expat, with has Objective C bindings.

您可以使用Expat,具有Objective C绑定。

Apple's options are:

Apple的选择是:

  1. The CF xml parser
  2. CF xml解析器

  3. The tree based Cocoa parser (10.4 only)
  4. 基于树的Cocoa解析器(仅限10.4)

#3


2  

Without knowing your language or environment, here are some perl expressions. Hopefully it will give you the right idea for your application.

在不了解您的语言或环境的情况下,这里有一些perl表达式。希望它能为您的应用提供正确的想法。

Your regular expression to capture the text content of a tag would look something like this:

用于捕获标记文本内容的正则表达式如下所示:

m/>([^<]*)</

This will capture the content in each tag. You will have to loop on the match to extract all content. Note that this does not account for self-terminated tags. You would need a regex engine with negative lookbehinds to accomplish that. Without knowing your environment, it's hard to say if it would be supported.

这将捕获每个标记中的内容。您必须循环匹配才能提取所有内容。请注意,这不考虑自终止标记。你需要一个具有负面外观的正则表达式引擎来实现这一目标。在不了解您的环境的情况下,很难说它是否会受到支持。

You could also just strip all tags from your source using something like:

您也可以使用以下内容从源中删除所有标记:

s/<[^>]*>//g

Also depending on your environment, if you can use an XML-parsing library, it will make your life much easier. After all, by taking the regex approach, you lose everything that XML really offers you (structured data, context awareness, etc).

另外,根据您的环境,如果您可以使用XML解析库,它将使您的生活更轻松。毕竟,通过采用正则表达式方法,您将失去XML真正为您提供的所有内容(结构化数据,上下文感知等)。

#4


1  

As others say, you should really be using NSXMLParser for this sort of thing.

正如其他人所说,你应该真的使用NSXMLParser来做这类事情。

HOWEVER, if you only need to extract the stuff in the name tags, then RegexKitLite can do it quite easily:

但是,如果您只需要提取名称标签中的内容,那么RegexKitLite可以很容易地完成:

NSString * xmlString = ...;
NSArray * captures = [xmlString arrayOfCaptureComponentsMatchedByRegex:@"<name>(.*?)</name>"];
for (NSArray * captureGroup in captures) {
  NSLog(@"Name: %@", [captureGroup objectAtIndex:1];
}

#5


0  

Careful about namespaces:

注意名称空间:

<prefix:name xmlns:prefix="">testAddress</prefix:name>

is equivalent XML that will break regexp based code. For XML, use an XML parser. XPath is your friend for things like this. The XPath code below will return a sequence of strings with the info you want:

是等效的XML,它将破坏基于regexp的代码。对于XML,请使用XML解析器。对于像这样的事情,XPath是你的朋友。下面的XPath代码将返回一系列带有您想要的信息的字符串:

./rsp/site/name/text()

Cocoa has NSXML support for XPath.

Cocoa对XPath有NSXML支持。

#1


1  

The best tool for this kind of task is XPath.

这种任务的最佳工具是XPath。

NSURL *rspURL = [NSURL fileURLWithPath:[@"~/rsp.xml" stringByExpandingTildeInPath]];
NSXMLDocument *document = [[[NSXMLDocument alloc] initWithContentsOfURL:rspURL options:NSXMLNodeOptionsNone error:NULL] autorelease];

NSArray *nodes = [document nodesForXPath:@"/rsp/site[1]/name" error:NULL];
NSString *name = [nodes count] > 0 ? [[nodes objectAtIndex:0] stringValue] : nil;

If you want the name of the site which has id 56789, use this XPath: /rsp/site[id='56789']/name instead. I suggest you read W3Schools XPath tutorial for a quick overview of the XPath syntax.

如果您想要站点名称为id 56789,请使用此XPath:/ rsp / site [id ='56789'] / name。我建议您阅读W3Schools XPath教程,以快速了解XPath语法。

#2


3  

<disclaimer>I don't use Objective-C</disclaimer>

<免责声明> 我不使用Objective-C

You should be using an XML parser, not regexes. XML is not a regular language, hence not easely parseable by a regular expression. Don't do it.

您应该使用XML解析器,而不是正则表达式。 XML不是常规语言,因此不能通过正则表达式轻松解析。不要这样做。

Never use regular expressions or basic string parsing to process XML. Every language in common usage right now has perfectly good XML support. XML is a deceptively complex standard and it's unlikely your code will be correct in the sense that it will properly parse all well-formed XML input, and even it if does, you're wasting your time because (as just mentioned) every language in common usage has XML support. It is unprofessional to use regular expressions to parse XML.

切勿使用正则表达式或基本字符串解析来处理XML。现在常用的每种语言都有非常好的XML支持。 XML是一个看似复杂的标准,并且你的代码不太可能正确地解析所有格式良好的XML输入,即便如此,你也浪费你的时间,因为(如刚刚提到的)每种语言都是常见用法有XML支持。使用正则表达式解析XML是不专业的。

You could use Expat, with has Objective C bindings.

您可以使用Expat,具有Objective C绑定。

Apple's options are:

Apple的选择是:

  1. The CF xml parser
  2. CF xml解析器

  3. The tree based Cocoa parser (10.4 only)
  4. 基于树的Cocoa解析器(仅限10.4)

#3


2  

Without knowing your language or environment, here are some perl expressions. Hopefully it will give you the right idea for your application.

在不了解您的语言或环境的情况下,这里有一些perl表达式。希望它能为您的应用提供正确的想法。

Your regular expression to capture the text content of a tag would look something like this:

用于捕获标记文本内容的正则表达式如下所示:

m/>([^<]*)</

This will capture the content in each tag. You will have to loop on the match to extract all content. Note that this does not account for self-terminated tags. You would need a regex engine with negative lookbehinds to accomplish that. Without knowing your environment, it's hard to say if it would be supported.

这将捕获每个标记中的内容。您必须循环匹配才能提取所有内容。请注意,这不考虑自终止标记。你需要一个具有负面外观的正则表达式引擎来实现这一目标。在不了解您的环境的情况下,很难说它是否会受到支持。

You could also just strip all tags from your source using something like:

您也可以使用以下内容从源中删除所有标记:

s/<[^>]*>//g

Also depending on your environment, if you can use an XML-parsing library, it will make your life much easier. After all, by taking the regex approach, you lose everything that XML really offers you (structured data, context awareness, etc).

另外,根据您的环境,如果您可以使用XML解析库,它将使您的生活更轻松。毕竟,通过采用正则表达式方法,您将失去XML真正为您提供的所有内容(结构化数据,上下文感知等)。

#4


1  

As others say, you should really be using NSXMLParser for this sort of thing.

正如其他人所说,你应该真的使用NSXMLParser来做这类事情。

HOWEVER, if you only need to extract the stuff in the name tags, then RegexKitLite can do it quite easily:

但是,如果您只需要提取名称标签中的内容,那么RegexKitLite可以很容易地完成:

NSString * xmlString = ...;
NSArray * captures = [xmlString arrayOfCaptureComponentsMatchedByRegex:@"<name>(.*?)</name>"];
for (NSArray * captureGroup in captures) {
  NSLog(@"Name: %@", [captureGroup objectAtIndex:1];
}

#5


0  

Careful about namespaces:

注意名称空间:

<prefix:name xmlns:prefix="">testAddress</prefix:name>

is equivalent XML that will break regexp based code. For XML, use an XML parser. XPath is your friend for things like this. The XPath code below will return a sequence of strings with the info you want:

是等效的XML,它将破坏基于regexp的代码。对于XML,请使用XML解析器。对于像这样的事情,XPath是你的朋友。下面的XPath代码将返回一系列带有您想要的信息的字符串:

./rsp/site/name/text()

Cocoa has NSXML support for XPath.

Cocoa对XPath有NSXML支持。