从字符串中移除特殊字符,除了HTML标签中的特殊字符[重复]

时间:2022-08-27 17:04:20

This question already has an answer here:

这个问题已经有了答案:

I have one string with HTML tags and special characters (\,:). I have remove special characters from the string using .replace(/[\:]/g,'') but the problem is my string contains HTML tags which have style property so the RegEx which I used is removing : from style property of span tags.

我有一个带有HTML标记和特殊字符的字符串(\,:)。但是问题是,我的字符串包含了具有样式属性的HTML标记,所以我使用的正则表达式是从span标签的样式属性中删除的。

I do not want to remove : from style property of span tags

我不想从span标记的样式属性中删除:from

Anyone suggest solution for this?

有人提出解决办法吗?

Find below link to regex101 https://regex101.com/r/UAOuDG/1

找到以下链接到regex101 https://regex101.com/r/UAOuDG/1

4 个解决方案

#1


2  

Don't do this but if you have to do it any way there's a workaround (not 100% guaranteed)

不要这样做,但是如果你必须这样做,就会有一个解决方案(不是100%的保证)

var str = "By: <span style='background-color:#ffc8c4;'>Anita</span> <span style='background-color:#ffc8c4;'>Elberse</span> and : Sir Alex Ferguson";

console.log(str.replace(/<\w+(?=[ >])[^<>]*>|(:)/g, function(_o, O_) {
    return  O_ ? '' : _o;
}));

Regex explanation:

正则表达式的解释:

<\w+(?=[ >]) # Begin matching opening tags
[^<>]*>      # Up to end
|            # Or (then)
(:)          # Any remaining colons

#2


1  

DOM; The right way

I wasn't going to add a DOM workaround since I respect tags under each topic. This answer exists against downvoters whom if don't comment their reason wouldn't make their days:

我不打算添加DOM工作区,因为我尊重每个主题下的标记。这个答案针对的是那些情绪低落的选民,如果他们不评论自己的理由,他们的日子就不会好过。

// Build our XPath query
var textNodes = document.evaluate("//body/text()", document, null, XPathResult.ANY_TYPE, null);
// Hold a pointer to current node
var currentText = textNodes.iterateNext(); 
list = [];
// Iterate over all nodes and store them
while (currentText) {
  list.push(currentText);
  currentText = textNodes.iterateNext();
}
// Modify all their contents
list.forEach(function(x) {
  x.textContent = x.textContent.replace(':', '')
});
<body>
  By: <span style='background-color:#ffc8c4;'>Anita</span> <span style='background-color:#ffc8c4;'>Elberse</span> and : Sir Alex Ferguson
</body>

#3


0  

First note - this isn't a fool proof solution. It's easy to break if you want to, but it'll handle many normal cases. Now, replacing

首先,这不是一个愚蠢的解决方案。如果你愿意的话很容易被打断,但是它会处理很多正常的情况。现在,取代

((['"])(?:\\.|(?!\2).)*\2)|:|([^'":]*)

with

$1$3

will remove all : that aren't inside quotes.

将删除所有:那不在引号内。

It starts of by trying to match and capture a whole string. If that doesn't match, it tries to match a colon. If that doesn't match either, everything up to the next colon or quote is matched and captured.

它首先尝试匹配并捕获整个字符串。如果不匹配,则尝试匹配冒号。如果这两者都不匹配,那么直到下一个冒号或引号的所有内容都将被匹配并捕获。

Now, if it was a string, it's in capture group 1. If it wasn't a string, nor a colon, it's in group 3. (2 is used internally to match the colons.)

如果它是一个字符串,它在捕获组1中。如果不是字符串,也不是冒号,它在第3组。(2用于内部匹配冒号。)

To keep everything we want we replace the match with group 1 & 3, which of one will have the captured match.

为了保持我们想要的一切,我们用1组和3组来替换匹配,哪个组将拥有捕获的匹配。

Note that the string matched can be either single or double quoted and also contain escaped quotes.

请注意,匹配的字符串可以是单引号或双引号,也可以包含转义引号。

See it here at regex101.

在regex101上可以看到。

var str="By: <span style='background-color:#ffc8c4;'>Anita</span> <span style=\"background-color:#ffc8c4;\">Elberse</span> and Sir Alex Ferguson";

console.log(str.replace(/((['"])(?:\\.|(?!\2).)*\2)|:|([^'":]*)/g, '$1$3'));

#4


0  

try this,

试试这个,

a = "By: <span style='background-color:#ffc8c4;'>Anita</span> <span style='background-color:#ffc8c4;'>Elberse</span> and Sir : Alex Ferguson"
b = a.replace(/(?!([^<]+>))+:/g, "")
console.log("original :", a);
console.log("replaced :", b);

#1


2  

Don't do this but if you have to do it any way there's a workaround (not 100% guaranteed)

不要这样做,但是如果你必须这样做,就会有一个解决方案(不是100%的保证)

var str = "By: <span style='background-color:#ffc8c4;'>Anita</span> <span style='background-color:#ffc8c4;'>Elberse</span> and : Sir Alex Ferguson";

console.log(str.replace(/<\w+(?=[ >])[^<>]*>|(:)/g, function(_o, O_) {
    return  O_ ? '' : _o;
}));

Regex explanation:

正则表达式的解释:

<\w+(?=[ >]) # Begin matching opening tags
[^<>]*>      # Up to end
|            # Or (then)
(:)          # Any remaining colons

#2


1  

DOM; The right way

I wasn't going to add a DOM workaround since I respect tags under each topic. This answer exists against downvoters whom if don't comment their reason wouldn't make their days:

我不打算添加DOM工作区,因为我尊重每个主题下的标记。这个答案针对的是那些情绪低落的选民,如果他们不评论自己的理由,他们的日子就不会好过。

// Build our XPath query
var textNodes = document.evaluate("//body/text()", document, null, XPathResult.ANY_TYPE, null);
// Hold a pointer to current node
var currentText = textNodes.iterateNext(); 
list = [];
// Iterate over all nodes and store them
while (currentText) {
  list.push(currentText);
  currentText = textNodes.iterateNext();
}
// Modify all their contents
list.forEach(function(x) {
  x.textContent = x.textContent.replace(':', '')
});
<body>
  By: <span style='background-color:#ffc8c4;'>Anita</span> <span style='background-color:#ffc8c4;'>Elberse</span> and : Sir Alex Ferguson
</body>

#3


0  

First note - this isn't a fool proof solution. It's easy to break if you want to, but it'll handle many normal cases. Now, replacing

首先,这不是一个愚蠢的解决方案。如果你愿意的话很容易被打断,但是它会处理很多正常的情况。现在,取代

((['"])(?:\\.|(?!\2).)*\2)|:|([^'":]*)

with

$1$3

will remove all : that aren't inside quotes.

将删除所有:那不在引号内。

It starts of by trying to match and capture a whole string. If that doesn't match, it tries to match a colon. If that doesn't match either, everything up to the next colon or quote is matched and captured.

它首先尝试匹配并捕获整个字符串。如果不匹配,则尝试匹配冒号。如果这两者都不匹配,那么直到下一个冒号或引号的所有内容都将被匹配并捕获。

Now, if it was a string, it's in capture group 1. If it wasn't a string, nor a colon, it's in group 3. (2 is used internally to match the colons.)

如果它是一个字符串,它在捕获组1中。如果不是字符串,也不是冒号,它在第3组。(2用于内部匹配冒号。)

To keep everything we want we replace the match with group 1 & 3, which of one will have the captured match.

为了保持我们想要的一切,我们用1组和3组来替换匹配,哪个组将拥有捕获的匹配。

Note that the string matched can be either single or double quoted and also contain escaped quotes.

请注意,匹配的字符串可以是单引号或双引号,也可以包含转义引号。

See it here at regex101.

在regex101上可以看到。

var str="By: <span style='background-color:#ffc8c4;'>Anita</span> <span style=\"background-color:#ffc8c4;\">Elberse</span> and Sir Alex Ferguson";

console.log(str.replace(/((['"])(?:\\.|(?!\2).)*\2)|:|([^'":]*)/g, '$1$3'));

#4


0  

try this,

试试这个,

a = "By: <span style='background-color:#ffc8c4;'>Anita</span> <span style='background-color:#ffc8c4;'>Elberse</span> and Sir : Alex Ferguson"
b = a.replace(/(?!([^<]+>))+:/g, "")
console.log("original :", a);
console.log("replaced :", b);