在div中包围希伯来语和英语文本

时间:2022-11-13 18:45:32

I am trying to add a span tag around Hebrew and English sentence in a paragraph. E.g. "so היי all whats up אתכם?" will become :

我想在一个段落中为希伯来语和英语句子添加一个span标签。如。“所以היי所有什么אתכם?”将成为:

[span]so[/span][span]היי[/span][span]all whats up[/span][span]אתכם[/span]

I have been trying with regexp but its just removing the Hebrew words and joining the English words in one span.

我一直在尝试使用regexp,但是它只是删除了希伯来语的单词,并在一个span中加入了英语单词。

var str = 'so היי all whats up אתכם?'
var match= str.match(/(\b[a-z]+\b)/ig);
var replace = match.join().replace(match.join(),'<span>'+match.join()+'</span>')

3 个解决方案

#1


9  

Previous answers here did not account for the whole word requirement. Indeed, it is difficult to achieve this since \b word boundary does not support word boundaries with neighboring Hebrew Unicode symbols that we can only match with a character class using \u notation.

之前的答案并没有解释整个单词的需求。实际上,实现这一点是很困难的,因为\b字边界不支持与相邻的希伯来文Unicode符号匹配的字边界,我们只能使用\u符号与字符类匹配。

I suggest using look-aheads and capturing groups to make sure we capture the whole Hebrew word ((^|[^\u0590-\u05FF])([\u0590-\u05FF]+)(?![\u0590-\u05FF]) that makes sure there is a non-Hebrew symbol or start of string before a Hebrew word - add a \s if there are spaces between the Hebrew words!), and \b[a-z\s]+\b to match sequence of whole English words separated with spaces.

我建议使用预见性和捕获组以确保我们捕捉整个希伯来语((^ |[^ \ u0590 - \ u05FF])([\ u0590 - \ u05FF]+)(? ![\ u0590 - \ u05FF]),确保有一个non-Hebrew符号或字符串的开始在希伯来语——添加\ s如果希伯来语单词之间有空格!),和\[a - z \ s]+ \ b匹配序列的整体英语单词用空格分开。

If you plan to insert the <span> tags into a sentence around whole words, here is a function that may help:

如果您打算将标记插入到围绕整个单词的句子中,这里有一个函数可能会有所帮助:

var str = 'so היי all whats up אתכם?';
//var str = 'so, היי, all whats up אתכם?';
var result = str.replace(/\s*(\b[a-z\s]+\b)\s*/ig, '<span>$1</span>');
result = result.replace(/(^|[^\u0590-\u05FF])([\u0590-\u05FF]+)(?![\u0590-\u05FF])/g, '$1<span>$2</span>');
document.getElementById("r").innerHTML = result;
span {
    background:#FFCCCC;
    border:1px solid #0000FF;
}
<div width="645" id="r"/>

Result:

结果:

<span>so</span><span>היי</span><span>all whats up</span><span>אתכם</span>?

If you do not need any punctuation or alphanumeric entities in your output, just concatenated whole English and Hebrew words, then use

如果您不需要在输出中使用任何标点或字母数字实体,只需连接整个英语和希伯来语单词,然后使用

var str = 'היי, User234, so 222היי all whats up אתכם?';
var re = /(^|[^\u0590-\u05FF])([\u0590-\u05FF]+)(?![\u0590-\u05FF])|(\b[a-z\s]+\b)/ig;
var res = [];
while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
  if (m[1] !== undefined) {
      res.push('<span>'+m[2].trim()+'</span>');
    }
  else
    {
      res.push('<span>'+m[3].trim()+'</span>');
    }
  
}
document.getElementById("r").innerHTML = res.join("");
span {
    background:#FFCCCC;
    border:1px solid #0000FF;
}
<div width="645" id="r"/>

Result:

结果:

<span>היי</span><span>so</span><span>היי</span><span>all whats up</span><span>אתכם</span>

#2


1  

I think the Regex you want is something like [^a-z^\u0591-\u05F4^\s]. I'm not entirely sure how you want to handle spaces.

我认为你想要的正则表达式是类似^[a - z ^ \ u0591 - \ u05F4 ^ \ s]。我不完全确定你想如何处理空格。

My solution

我的解决方案

Copy str to a new var res, replacing any characters that aren't A-Z / Hebrew.
Loop over any english (a-z) characters in str and wrap them in a span, using res.replace.
Do the same again for the Hebrew characters.

将str复制到一个新的var,替换任何不是a - z /希伯来语的字符。使用res.replace对str中的任何英语(a-z)字符进行循环,并将它们以span形式包装。对希伯来语的字符做同样的处理。

It's not quite 100%, but seems to work well enough IMO.

虽然不是100%,但在我看来已经足够好了。

var str = 'so היי all whats up אתכם?';
var finalStr = str.replace(/([^a-z^\u0591-\u05F4^\s])/gi, '');

var rgx = /([a-z ]+)/gi;
var mat = str.match(rgx);

for(var i=0; i < mat.length; ++i){
    var match = mat[i];
    finalStr = finalStr.replace(match.trim(),'<span>'+match.trim()+'</span>');
}

rgx = /([\u0591-\u05F4 ]+)/gi;
var mat = str.match(rgx);

for(var i=0; i < mat.length; ++i){
    var match = mat[i];
    finalStr = finalStr.replace(match.trim(),'<span>'+match.trim()+'</span>');
}

document.getElementById('res').innerHTML = finalStr;

http://jsfiddle.net/daveSalomon/0ns6nuxy/1/

http://jsfiddle.net/daveSalomon/0ns6nuxy/1/

#3


0  

Judging by this post you can try something like this: ((?:\s*\w+)+|(?:\s*[\u0590-\u05FF]+)+?(?=\s?[A-Za-z0-9!?.])) https://regex101.com/r/kA3yV5/4

从这篇文章你可以尝试这样的:((?:\ s * \ w +)+ |(?:\ s *[\ u0590 - \ u05FF]+)+ ?(= \ s ?[A-Za-z0-9 ! ?)))https://regex101.com/r/kA3yV5/4

You may need to edit it for your particular cases (for example, if some non-word characters start to appear), but it does the trick. It tries to match words and form sentences from English character list, if it doesn't work, it tries to make words/sentences out of Hebrew character list, until an english character is spotted again.

您可能需要为您的特殊情况编辑它(例如,如果某些非单词字符开始出现),但是它可以做到这一点。它试图从英语字符列表中匹配单词和句子,如果不奏效,它试图从希伯来字符列表中提取单词/句子,直到一个英语字符再次被发现。

It's not perfect yet, as you may want to add other punctuation characters and there's some spaces you don't want in the 1st position (because javascript doesn't support lookbehinds, I didn't figure out a good way to remove them on the spot, but they can be at position 1 and removed from string)

还不是完美的,因为你可能想添加其他标点符号和有一些空间你不希望在第一位置(因为javascript不支持向后插入,我没有想出一个好办法当场删除它们,但它们可以在位置1和移除字符串)

#1


9  

Previous answers here did not account for the whole word requirement. Indeed, it is difficult to achieve this since \b word boundary does not support word boundaries with neighboring Hebrew Unicode symbols that we can only match with a character class using \u notation.

之前的答案并没有解释整个单词的需求。实际上,实现这一点是很困难的,因为\b字边界不支持与相邻的希伯来文Unicode符号匹配的字边界,我们只能使用\u符号与字符类匹配。

I suggest using look-aheads and capturing groups to make sure we capture the whole Hebrew word ((^|[^\u0590-\u05FF])([\u0590-\u05FF]+)(?![\u0590-\u05FF]) that makes sure there is a non-Hebrew symbol or start of string before a Hebrew word - add a \s if there are spaces between the Hebrew words!), and \b[a-z\s]+\b to match sequence of whole English words separated with spaces.

我建议使用预见性和捕获组以确保我们捕捉整个希伯来语((^ |[^ \ u0590 - \ u05FF])([\ u0590 - \ u05FF]+)(? ![\ u0590 - \ u05FF]),确保有一个non-Hebrew符号或字符串的开始在希伯来语——添加\ s如果希伯来语单词之间有空格!),和\[a - z \ s]+ \ b匹配序列的整体英语单词用空格分开。

If you plan to insert the <span> tags into a sentence around whole words, here is a function that may help:

如果您打算将标记插入到围绕整个单词的句子中,这里有一个函数可能会有所帮助:

var str = 'so היי all whats up אתכם?';
//var str = 'so, היי, all whats up אתכם?';
var result = str.replace(/\s*(\b[a-z\s]+\b)\s*/ig, '<span>$1</span>');
result = result.replace(/(^|[^\u0590-\u05FF])([\u0590-\u05FF]+)(?![\u0590-\u05FF])/g, '$1<span>$2</span>');
document.getElementById("r").innerHTML = result;
span {
    background:#FFCCCC;
    border:1px solid #0000FF;
}
<div width="645" id="r"/>

Result:

结果:

<span>so</span><span>היי</span><span>all whats up</span><span>אתכם</span>?

If you do not need any punctuation or alphanumeric entities in your output, just concatenated whole English and Hebrew words, then use

如果您不需要在输出中使用任何标点或字母数字实体,只需连接整个英语和希伯来语单词,然后使用

var str = 'היי, User234, so 222היי all whats up אתכם?';
var re = /(^|[^\u0590-\u05FF])([\u0590-\u05FF]+)(?![\u0590-\u05FF])|(\b[a-z\s]+\b)/ig;
var res = [];
while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
  if (m[1] !== undefined) {
      res.push('<span>'+m[2].trim()+'</span>');
    }
  else
    {
      res.push('<span>'+m[3].trim()+'</span>');
    }
  
}
document.getElementById("r").innerHTML = res.join("");
span {
    background:#FFCCCC;
    border:1px solid #0000FF;
}
<div width="645" id="r"/>

Result:

结果:

<span>היי</span><span>so</span><span>היי</span><span>all whats up</span><span>אתכם</span>

#2


1  

I think the Regex you want is something like [^a-z^\u0591-\u05F4^\s]. I'm not entirely sure how you want to handle spaces.

我认为你想要的正则表达式是类似^[a - z ^ \ u0591 - \ u05F4 ^ \ s]。我不完全确定你想如何处理空格。

My solution

我的解决方案

Copy str to a new var res, replacing any characters that aren't A-Z / Hebrew.
Loop over any english (a-z) characters in str and wrap them in a span, using res.replace.
Do the same again for the Hebrew characters.

将str复制到一个新的var,替换任何不是a - z /希伯来语的字符。使用res.replace对str中的任何英语(a-z)字符进行循环,并将它们以span形式包装。对希伯来语的字符做同样的处理。

It's not quite 100%, but seems to work well enough IMO.

虽然不是100%,但在我看来已经足够好了。

var str = 'so היי all whats up אתכם?';
var finalStr = str.replace(/([^a-z^\u0591-\u05F4^\s])/gi, '');

var rgx = /([a-z ]+)/gi;
var mat = str.match(rgx);

for(var i=0; i < mat.length; ++i){
    var match = mat[i];
    finalStr = finalStr.replace(match.trim(),'<span>'+match.trim()+'</span>');
}

rgx = /([\u0591-\u05F4 ]+)/gi;
var mat = str.match(rgx);

for(var i=0; i < mat.length; ++i){
    var match = mat[i];
    finalStr = finalStr.replace(match.trim(),'<span>'+match.trim()+'</span>');
}

document.getElementById('res').innerHTML = finalStr;

http://jsfiddle.net/daveSalomon/0ns6nuxy/1/

http://jsfiddle.net/daveSalomon/0ns6nuxy/1/

#3


0  

Judging by this post you can try something like this: ((?:\s*\w+)+|(?:\s*[\u0590-\u05FF]+)+?(?=\s?[A-Za-z0-9!?.])) https://regex101.com/r/kA3yV5/4

从这篇文章你可以尝试这样的:((?:\ s * \ w +)+ |(?:\ s *[\ u0590 - \ u05FF]+)+ ?(= \ s ?[A-Za-z0-9 ! ?)))https://regex101.com/r/kA3yV5/4

You may need to edit it for your particular cases (for example, if some non-word characters start to appear), but it does the trick. It tries to match words and form sentences from English character list, if it doesn't work, it tries to make words/sentences out of Hebrew character list, until an english character is spotted again.

您可能需要为您的特殊情况编辑它(例如,如果某些非单词字符开始出现),但是它可以做到这一点。它试图从英语字符列表中匹配单词和句子,如果不奏效,它试图从希伯来字符列表中提取单词/句子,直到一个英语字符再次被发现。

It's not perfect yet, as you may want to add other punctuation characters and there's some spaces you don't want in the 1st position (because javascript doesn't support lookbehinds, I didn't figure out a good way to remove them on the spot, but they can be at position 1 and removed from string)

还不是完美的,因为你可能想添加其他标点符号和有一些空间你不希望在第一位置(因为javascript不支持向后插入,我没有想出一个好办法当场删除它们,但它们可以在位置1和移除字符串)