使用JavaScript中的正则表达式从锚标记字符串中提取内部文本

时间:2022-11-27 18:00:03

I am new to angular js . I have regex which gets all the anchor tags. My reg ex is

我是棱角分明的新手。我有正则表达式获取所有锚标签。我的注册是

/<a[^>]*>([^<]+)<\/a>/g

And I am using the match function here like ,

我在这里使用匹配函数,

var str =  '<a href="mailto:abc.jagadale@gmail.com" style="color:inherit;text-decoration:inherit">abc.jagadale@gmail.com</a>'

So Now I am using the code like

所以现在我正在使用代码

var value = str.match(/<a[^>]*>([^<]+)<\/a>/g);

So, Here I am expecting the output to be abc.jagadale@gmail.com , But I am getting the exact same string as a input string . can any one please help me with this ? Thanks in advance.

所以,我希望输出为abc.jagadale@gmail.com,但我得到的输入字符串完全相同。任何人都可以帮我这个吗?提前致谢。

4 个解决方案

#1


1  

Why are you trying to reinvent the wheel?

你为什么要重新发明*?

You are trying to parse the HTML string with a regex it will be a very complicated task, just use DOM or jQuery to get the links contents, they are made for this.

您正在尝试使用正则表达式解析HTML字符串,这将是一项非常复杂的任务,只需使用DOM或jQuery来获取链接内容,它们就是为此而制作的。

  • Put the HTML string as the HTML of a jQuery/DOM element.

    将HTML字符串作为jQuery / DOM元素的HTML。

  • Then fetch this created DOM element to get all the a elements inside it and return their contents in an array.

    然后获取这个创建的DOM元素以获取其中的所有元素并将其内容返回到数组中。

This is how should be your code:

这是你的代码应该如何:

var str = '<a href="mailto:abc.jagadale@gmail.com" style="color:inherit;text-decoration:inherit">abc.jagadale@gmail.com</a>';

var results = [];
$("<div></div>").html(str).find("a").each(function(l) {
  results.push($(this).text());
});

Demo:

var str = '<a href="mailto:abc.jagadale@gmail.com" style="color:inherit;text-decoration:inherit">abc.jagadale@gmail.com</a>';

var results = [];
$("<div></div>").html(str).find("a").each(function(l) {
  results.push($(this).text());
});
console.log(results);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

#2


0  

You need to capture the group inside the anchor tags. The regular expression already matches the inner group ([^<]+) But, when matching there are different ways to extract that inner text.

您需要捕获锚标记内的组。正则表达式已经匹配内部组([^ <] +)但是,当匹配时,有不同的方法来提取内部文本。

When using the Match function it will return an array of matched elements, the first one, will match the whole regular expression and the following elements will match the included groups in the regular expression.

使用Match函数时,它将返回匹配元素的数组,第一个匹配元素将匹配整个正则表达式,并且以下元素将匹配正则表达式中包含的组。

Try this:

var reg = /<a[^>]*>([^<]+)<\/a>/g

reg.exec(str)[1]

Also the match function will return an array only if the g flag is not present.

只有当g标志不存在时,匹配函数才会返回一个数组。

Check https://javascript.info/regexp-groups for further documentation.

请访问https://javascript.info/regexp-groups以获取更多文档。

#3


0  

Brief

Don't use regex for this. Regex is a great tool, don't get me wrong, but it's not what you're looking for. Regex cannot properly parse HTML and should only be used to do so if it's a limited, known set of HTML.

不要使用正则表达式。正则表达式是一个伟大的工具,不要误会我的意思,但它不是你想要的。正则表达式无法正确解析HTML,只有在它是一组有限的已知HTML时才能使用它。

Try, for example, adding content:">" to your style attribute. You'll see your pattern now fails or gives you an incorrect result. I don't like to use this quote all the time, but I think it's necessary to use it in this case:

例如,尝试在您的样式属性中添加内容:“>”。你会看到你的模式现在失败或者给你一个不正确的结果。我不喜欢一直使用这个引用,但我认为在这种情况下有必要使用它:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

有些人在遇到问题时会想“我知道,我会使用正则表达式”。现在他们有两个问题。

Use builtin functions. jQuery makes this super easy to accomplish. See my Code section for a demonstration. It's way more legible than any regex variant.

使用内置函数。 jQuery让这个非常容易实现。有关演示,请参阅我的代码部分。它比任何正则表达式更易读。


Code

DOM from page

The following snippet gets all anchors on the actual page.

以下代码段获取实际页面上的所有锚点。

$("a").each(function() {
  console.log($(this).text())
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<a href="mailto:abc.jagadale@gmail.com">abc.jagadale@gmail.com</a>
<a href="mailto:abc2.jagadale@gmail.com">abc2.jagadale@gmail.com</a>

DOM in string

The following snippet gets all anchors in the string (converted to DOM element)

以下代码片段获取字符串中的所有锚点(转换为DOM元素)

var s = `<a href="mailto:email3@domain.com">email3@domain.com</a>
<a href="mailto:email4@domain.com">email4@domain.com</a>`

$("<div></div>").html(s).find("a").each(function() {
  console.log($(this).text())
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<a href="mailto:email1@domain.com">email1@domain.com</a>
<a href="mailto:email2@domain.com">email2@domain.com</a>

#4


0  

Given the use case of parsing a string, instead of having an actual DOM to work with, it does seem like regex is the way to go, unless you want to load the HTML into a document fragment and parse that.

给定解析字符串的用例,而不是使用实际的DOM,看起来似乎是正则表达式,除非你想将HTML加载到文档片段并解析它。

One way to get all of your matches is to make use of split:

获得所有比赛的一种方法是使用分割:

var htmlstr = "<p><a href='url'>asdf@bsdf.com</a></p>"

var matches = htmlstr.split(/<a.+?>([A-Za-z.@]+)<\/a>/).filter((t, i) => i % 2)

Using a regex with split returns all of the matches along with the text around them, then filtering by index % 2 will pare it down to just the regex matches.

使用带有split的正则表达式返回所有匹配项以及它们周围的文本,然后通过索引%2进行过滤将减去正则表达式匹配。

#1


1  

Why are you trying to reinvent the wheel?

你为什么要重新发明*?

You are trying to parse the HTML string with a regex it will be a very complicated task, just use DOM or jQuery to get the links contents, they are made for this.

您正在尝试使用正则表达式解析HTML字符串,这将是一项非常复杂的任务,只需使用DOM或jQuery来获取链接内容,它们就是为此而制作的。

  • Put the HTML string as the HTML of a jQuery/DOM element.

    将HTML字符串作为jQuery / DOM元素的HTML。

  • Then fetch this created DOM element to get all the a elements inside it and return their contents in an array.

    然后获取这个创建的DOM元素以获取其中的所有元素并将其内容返回到数组中。

This is how should be your code:

这是你的代码应该如何:

var str = '<a href="mailto:abc.jagadale@gmail.com" style="color:inherit;text-decoration:inherit">abc.jagadale@gmail.com</a>';

var results = [];
$("<div></div>").html(str).find("a").each(function(l) {
  results.push($(this).text());
});

Demo:

var str = '<a href="mailto:abc.jagadale@gmail.com" style="color:inherit;text-decoration:inherit">abc.jagadale@gmail.com</a>';

var results = [];
$("<div></div>").html(str).find("a").each(function(l) {
  results.push($(this).text());
});
console.log(results);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

#2


0  

You need to capture the group inside the anchor tags. The regular expression already matches the inner group ([^<]+) But, when matching there are different ways to extract that inner text.

您需要捕获锚标记内的组。正则表达式已经匹配内部组([^ <] +)但是,当匹配时,有不同的方法来提取内部文本。

When using the Match function it will return an array of matched elements, the first one, will match the whole regular expression and the following elements will match the included groups in the regular expression.

使用Match函数时,它将返回匹配元素的数组,第一个匹配元素将匹配整个正则表达式,并且以下元素将匹配正则表达式中包含的组。

Try this:

var reg = /<a[^>]*>([^<]+)<\/a>/g

reg.exec(str)[1]

Also the match function will return an array only if the g flag is not present.

只有当g标志不存在时,匹配函数才会返回一个数组。

Check https://javascript.info/regexp-groups for further documentation.

请访问https://javascript.info/regexp-groups以获取更多文档。

#3


0  

Brief

Don't use regex for this. Regex is a great tool, don't get me wrong, but it's not what you're looking for. Regex cannot properly parse HTML and should only be used to do so if it's a limited, known set of HTML.

不要使用正则表达式。正则表达式是一个伟大的工具,不要误会我的意思,但它不是你想要的。正则表达式无法正确解析HTML,只有在它是一组有限的已知HTML时才能使用它。

Try, for example, adding content:">" to your style attribute. You'll see your pattern now fails or gives you an incorrect result. I don't like to use this quote all the time, but I think it's necessary to use it in this case:

例如,尝试在您的样式属性中添加内容:“>”。你会看到你的模式现在失败或者给你一个不正确的结果。我不喜欢一直使用这个引用,但我认为在这种情况下有必要使用它:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

有些人在遇到问题时会想“我知道,我会使用正则表达式”。现在他们有两个问题。

Use builtin functions. jQuery makes this super easy to accomplish. See my Code section for a demonstration. It's way more legible than any regex variant.

使用内置函数。 jQuery让这个非常容易实现。有关演示,请参阅我的代码部分。它比任何正则表达式更易读。


Code

DOM from page

The following snippet gets all anchors on the actual page.

以下代码段获取实际页面上的所有锚点。

$("a").each(function() {
  console.log($(this).text())
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<a href="mailto:abc.jagadale@gmail.com">abc.jagadale@gmail.com</a>
<a href="mailto:abc2.jagadale@gmail.com">abc2.jagadale@gmail.com</a>

DOM in string

The following snippet gets all anchors in the string (converted to DOM element)

以下代码片段获取字符串中的所有锚点(转换为DOM元素)

var s = `<a href="mailto:email3@domain.com">email3@domain.com</a>
<a href="mailto:email4@domain.com">email4@domain.com</a>`

$("<div></div>").html(s).find("a").each(function() {
  console.log($(this).text())
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<a href="mailto:email1@domain.com">email1@domain.com</a>
<a href="mailto:email2@domain.com">email2@domain.com</a>

#4


0  

Given the use case of parsing a string, instead of having an actual DOM to work with, it does seem like regex is the way to go, unless you want to load the HTML into a document fragment and parse that.

给定解析字符串的用例,而不是使用实际的DOM,看起来似乎是正则表达式,除非你想将HTML加载到文档片段并解析它。

One way to get all of your matches is to make use of split:

获得所有比赛的一种方法是使用分割:

var htmlstr = "<p><a href='url'>asdf@bsdf.com</a></p>"

var matches = htmlstr.split(/<a.+?>([A-Za-z.@]+)<\/a>/).filter((t, i) => i % 2)

Using a regex with split returns all of the matches along with the text around them, then filtering by index % 2 will pare it down to just the regex matches.

使用带有split的正则表达式返回所有匹配项以及它们周围的文本,然后通过索引%2进行过滤将减去正则表达式匹配。