解码,回到& JavaScript

时间:2022-10-17 08:32:19

I have strings like

我的弦

var str = 'One & two & three';

rendered into HTML by the web server. I need to transform those strings into

由web服务器呈现为HTML。我需要把这些字符串转换成

'One & two & three'

Currently, that's what I am doing (with help of jQuery):

目前,这就是我正在做的(借助jQuery):

$(document.createElement('div')).html('{{ driver.person.name }}').text()

However I have an unsettling feeling that I am doing it wrong. I have tried

然而我有一种不安的感觉,我做错了。我有试过

unescape("&")

but it doesn't seem to work, neither do decodeURI/decodeURIComponent.

但它似乎不起作用,decodeURI/decodeURIComponent也不起作用。

Are there any other, more native and elegant ways of doing so?

还有其他更自然、更优雅的方式吗?

11 个解决方案

#1


39  

A more modern option for interpreting HTML (text and otherwise) from JavaScript is the HTML support in the DOMParser API (see here in MDN). This allows you to use the browser's native HTML parser to convert a string to an HTML document. It has been supported in new versions of all major browsers since late 2014.

从JavaScript中解释HTML(文本和其他方式)的更现代的选择是DOMParser API中的HTML支持(请参阅MDN)。这允许您使用浏览器的本机HTML解析器将字符串转换为HTML文档。自2014年末以来,所有主流浏览器的新版本都支持它。

If we just want to decode some text content, we can put it as the sole content in a document body, parse the document, and pull out the its .body.textContent.

如果我们只是想解码一些文本内容,我们可以把它作为文档主体中的唯一内容,解析文档,然后取出它的.body. textcontent。

var encodedStr = 'hello & world';

var parser = new DOMParser;
var dom = parser.parseFromString(
    '<!doctype html><body>' + encodedStr,
    'text/html');
var decodedString = dom.body.textContent;

console.log(decodedString);

We can see in the draft specification for DOMParser that JavaScript is not enabled for the parsed document, so we can perform this text conversion without security concerns.

我们可以在DOMParser的规范草案中看到,对于解析后的文档,没有启用JavaScript,因此我们可以在不考虑安全性的情况下执行文本转换。

The parseFromString(str, type) method must run these steps, depending on type:

parseFromString(str, type)方法必须运行这些步骤,具体取决于类型:

  • "text/html"

    “text / html”

    Parse str with an HTML parser, and return the newly created Document.

    使用HTML解析器解析str,并返回新创建的文档。

    The scripting flag must be set to "disabled".

    必须将脚本标志设置为“禁用”。

    NOTE

    script elements get marked unexecutable and the contents of noscript get parsed as markup.

    脚本元素被标记为不可执行,noscript的内容被解析为标记。

It's beyond the scope of this question, but please note that if you're taking the parsed DOM nodes themselves (not just their text content) and moving them to the live document DOM, it's possible that their scripting would be reenabled, and there could be security concerns. I haven't researched it, so please exercise caution.

这超出了这个问题的范围,但是请注意,如果您将解析的DOM节点本身(不只是它们的文本内容)并将它们移动到活文档DOM,那么它们的脚本可能会被重新启用,并且可能存在安全问题。我还没有研究过,所以请多加注意。

#2


233  

Do you need to decode all encoded HTML entities or just &amp; itself?

是否需要解码所有编码的HTML实体或仅仅& &;本身?

If you only need to handle &amp; then you can do this:

如果你只需要处理然后你可以这样做:

var decoded = encoded.replace(/&amp;/g, '&');

If you need to decode all HTML entities then you can do it without jQuery:

如果你需要解码所有的HTML实体,那么你可以不用jQuery:

var elem = document.createElement('textarea');
elem.innerHTML = encoded;
var decoded = elem.value;

Please take note of Mark's comments below which highlight security holes in an earlier version of this answer and recommend using textarea rather than div to mitigate against potential XSS vulnerabilities. These vulnerabilities exist whether you use jQuery or plain JavaScript.

请注意下面Mark的评论,它突出了这个答案早期版本中的安全漏洞,并建议使用textarea而不是div来缓解潜在的XSS漏洞。无论使用jQuery还是纯JavaScript,都存在这些漏洞。

#3


25  

Matthias Bynens has a library for this: https://github.com/mathiasbynens/he

Matthias Bynens对此有一个库:https://github.com/mathiasbynens/he

Example:

例子:

console.log(
    he.decode("J&#246;rg &amp J&#xFC;rgen rocked to &amp; fro ")
);
// Logs "Jörg & Jürgen rocked to & fro"

I suggest favouring it over hacks involving setting an element's HTML content and then reading back its text content. Such approaches can work, but are deceptively dangerous and present XSS opportunities if used on untrusted user input.

与设置元素的HTML内容然后读取其文本内容相比,我建议更喜欢它。这种方法可以工作,但是很危险,如果在不受信任的用户输入中使用,就会带来XSS机会。

If you really can't bear to load in a library, you can use the textarea hack described in this answer to a near-duplicate question, which, unlike various similar approaches that have been suggested, has no security holes that I know of:

如果你实在无法忍受在库中加载,你可以使用本文中所描述的对一个近乎重复的问题的回答的textarea hack,它与前面提到的各种类似方法不同,没有我所知道的安全漏洞:

function decodeEntities(encodedString) {
    var textArea = document.createElement('textarea');
    textArea.innerHTML = encodedString;
    return textArea.value;
}

console.log(decodeEntities('1 &amp; 2')); // '1 & 2'

But take note of the security issues, affecting similar approaches to this one, that I list in the linked answer! This approach is a hack, and future changes to the permissible content of a textarea (or bugs in particular browsers) could lead to code that relies upon it suddenly having an XSS hole one day.

但请注意安全问题,影响类似的方法,我在链接答案中列出了!这种方法是一种技巧,将来对文本区域允许内容(或特定浏览器中的bug)的更改可能会导致代码突然出现XSS漏洞。

#4


23  

var htmlEnDeCode = (function() {
    var charToEntityRegex,
        entityToCharRegex,
        charToEntity,
        entityToChar;

    function resetCharacterEntities() {
        charToEntity = {};
        entityToChar = {};
        // add the default set
        addCharacterEntities({
            '&amp;'     :   '&',
            '&gt;'      :   '>',
            '&lt;'      :   '<',
            '&quot;'    :   '"',
            '&#39;'     :   "'"
        });
    }

    function addCharacterEntities(newEntities) {
        var charKeys = [],
            entityKeys = [],
            key, echar;
        for (key in newEntities) {
            echar = newEntities[key];
            entityToChar[key] = echar;
            charToEntity[echar] = key;
            charKeys.push(echar);
            entityKeys.push(key);
        }
        charToEntityRegex = new RegExp('(' + charKeys.join('|') + ')', 'g');
        entityToCharRegex = new RegExp('(' + entityKeys.join('|') + '|&#[0-9]{1,5};' + ')', 'g');
    }

    function htmlEncode(value){
        var htmlEncodeReplaceFn = function(match, capture) {
            return charToEntity[capture];
        };

        return (!value) ? value : String(value).replace(charToEntityRegex, htmlEncodeReplaceFn);
    }

    function htmlDecode(value) {
        var htmlDecodeReplaceFn = function(match, capture) {
            return (capture in entityToChar) ? entityToChar[capture] : String.fromCharCode(parseInt(capture.substr(2), 10));
        };

        return (!value) ? value : String(value).replace(entityToCharRegex, htmlDecodeReplaceFn);
    }

    resetCharacterEntities();

    return {
        htmlEncode: htmlEncode,
        htmlDecode: htmlDecode
    };
})();

This is from ExtJS source code.

这是ExtJS源代码。

#5


12  

element.innerText also does the trick.

元素。innerText也起到这个作用。

#6


4  

First create a <span id="decodeIt" style="display:none;"></span> somewhere in the body

首先创建一个在体内某处

Next, assign the string to be decoded as innerHTML to this:

接下来,将要解码为innerHTML的字符串赋值为:

document.getElementById("decodeIt").innerHTML=stringtodecode

Finally,

最后,

stringtodecode=document.getElementById("decodeIt").innerText

Here is the overall code:

以下是总体代码:

var stringtodecode="<B>Hello</B> world<br>";
document.getElementById("decodeIt").innerHTML=stringtodecode;
stringtodecode=document.getElementById("decodeIt").innerText

#7


3  

jQuery will encode and decode for you. However, you need to use a textarea tag, not a div.

jQuery将为您编码和解码。但是,您需要使用textarea标记,而不是div。

var str1 = 'One & two & three';
var str2 = "One &amp; two &amp; three";
  
$(document).ready(function() {
   $("#encoded").text(htmlEncode(str1)); 
   $("#decoded").text(htmlDecode(str2));
});

function htmlDecode(value) {
  return $("<textarea/>").html(value).text();
}

function htmlEncode(value) {
  return $('<textarea/>').text(value).html();
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>

<div id="encoded"></div>
<div id="decoded"></div>

#8


1  

For one-line guys:

一行人:

const htmlDecode = innerHTML => Object.assign(document.createElement('textarea'), {innerHTML}).value;

console.log(htmlDecode('Complicated - Dimitri Vegas &amp; Like Mike'));

#9


1  

In case you're looking for it, like me - meanwhile there's a nice and safe JQuery method.

如果你想找的话,像我一样——同时还有一个安全的JQuery方法。

https://api.jquery.com/jquery.parsehtml/

https://api.jquery.com/jquery.parsehtml/

You can f.ex. type this in your console:

你可以f.ex。在您的控制台中输入:

var x = "test &amp;";
> undefined
$.parseHTML(x)[0].textContent
> "test &"

So $.parseHTML(x) returns an array, and if you have HTML markup within your text, the array.length will be greater than 1.

所以$. parsehtml (x)返回一个数组,如果文本中有HTML标记,则返回数组。长度将大于1。

#10


0  

a javascript solution that catches the common ones:

一个javascript解决方案,它捕获了常见的一个:

var map = {amp: '&', lt: '<', gt: '>', quot: '"', '#039': "'"}
str = str.replace(/&([^;]+);/g, (m, c) => map[c])

this is the reverse of https://*.com/a/4835406/2738039

这与https://*.com/a/4835406/2738039相反

#11


0  

You can use Lodash unescape / escape function https://lodash.com/docs/4.17.5#unescape

您可以使用Lodash unescape / escape函数https://lodash.com/docs/4.17.5#unescape

import unescape from 'lodash/unescape';

const str = unescape('fred, barney, &amp; pebbles');

str will become 'fred, barney, & pebbles'

str将变成“fred, barney, & pebbles”

#1


39  

A more modern option for interpreting HTML (text and otherwise) from JavaScript is the HTML support in the DOMParser API (see here in MDN). This allows you to use the browser's native HTML parser to convert a string to an HTML document. It has been supported in new versions of all major browsers since late 2014.

从JavaScript中解释HTML(文本和其他方式)的更现代的选择是DOMParser API中的HTML支持(请参阅MDN)。这允许您使用浏览器的本机HTML解析器将字符串转换为HTML文档。自2014年末以来,所有主流浏览器的新版本都支持它。

If we just want to decode some text content, we can put it as the sole content in a document body, parse the document, and pull out the its .body.textContent.

如果我们只是想解码一些文本内容,我们可以把它作为文档主体中的唯一内容,解析文档,然后取出它的.body. textcontent。

var encodedStr = 'hello &amp; world';

var parser = new DOMParser;
var dom = parser.parseFromString(
    '<!doctype html><body>' + encodedStr,
    'text/html');
var decodedString = dom.body.textContent;

console.log(decodedString);

We can see in the draft specification for DOMParser that JavaScript is not enabled for the parsed document, so we can perform this text conversion without security concerns.

我们可以在DOMParser的规范草案中看到,对于解析后的文档,没有启用JavaScript,因此我们可以在不考虑安全性的情况下执行文本转换。

The parseFromString(str, type) method must run these steps, depending on type:

parseFromString(str, type)方法必须运行这些步骤,具体取决于类型:

  • "text/html"

    “text / html”

    Parse str with an HTML parser, and return the newly created Document.

    使用HTML解析器解析str,并返回新创建的文档。

    The scripting flag must be set to "disabled".

    必须将脚本标志设置为“禁用”。

    NOTE

    script elements get marked unexecutable and the contents of noscript get parsed as markup.

    脚本元素被标记为不可执行,noscript的内容被解析为标记。

It's beyond the scope of this question, but please note that if you're taking the parsed DOM nodes themselves (not just their text content) and moving them to the live document DOM, it's possible that their scripting would be reenabled, and there could be security concerns. I haven't researched it, so please exercise caution.

这超出了这个问题的范围,但是请注意,如果您将解析的DOM节点本身(不只是它们的文本内容)并将它们移动到活文档DOM,那么它们的脚本可能会被重新启用,并且可能存在安全问题。我还没有研究过,所以请多加注意。

#2


233  

Do you need to decode all encoded HTML entities or just &amp; itself?

是否需要解码所有编码的HTML实体或仅仅& &;本身?

If you only need to handle &amp; then you can do this:

如果你只需要处理然后你可以这样做:

var decoded = encoded.replace(/&amp;/g, '&');

If you need to decode all HTML entities then you can do it without jQuery:

如果你需要解码所有的HTML实体,那么你可以不用jQuery:

var elem = document.createElement('textarea');
elem.innerHTML = encoded;
var decoded = elem.value;

Please take note of Mark's comments below which highlight security holes in an earlier version of this answer and recommend using textarea rather than div to mitigate against potential XSS vulnerabilities. These vulnerabilities exist whether you use jQuery or plain JavaScript.

请注意下面Mark的评论,它突出了这个答案早期版本中的安全漏洞,并建议使用textarea而不是div来缓解潜在的XSS漏洞。无论使用jQuery还是纯JavaScript,都存在这些漏洞。

#3


25  

Matthias Bynens has a library for this: https://github.com/mathiasbynens/he

Matthias Bynens对此有一个库:https://github.com/mathiasbynens/he

Example:

例子:

console.log(
    he.decode("J&#246;rg &amp J&#xFC;rgen rocked to &amp; fro ")
);
// Logs "Jörg & Jürgen rocked to & fro"

I suggest favouring it over hacks involving setting an element's HTML content and then reading back its text content. Such approaches can work, but are deceptively dangerous and present XSS opportunities if used on untrusted user input.

与设置元素的HTML内容然后读取其文本内容相比,我建议更喜欢它。这种方法可以工作,但是很危险,如果在不受信任的用户输入中使用,就会带来XSS机会。

If you really can't bear to load in a library, you can use the textarea hack described in this answer to a near-duplicate question, which, unlike various similar approaches that have been suggested, has no security holes that I know of:

如果你实在无法忍受在库中加载,你可以使用本文中所描述的对一个近乎重复的问题的回答的textarea hack,它与前面提到的各种类似方法不同,没有我所知道的安全漏洞:

function decodeEntities(encodedString) {
    var textArea = document.createElement('textarea');
    textArea.innerHTML = encodedString;
    return textArea.value;
}

console.log(decodeEntities('1 &amp; 2')); // '1 & 2'

But take note of the security issues, affecting similar approaches to this one, that I list in the linked answer! This approach is a hack, and future changes to the permissible content of a textarea (or bugs in particular browsers) could lead to code that relies upon it suddenly having an XSS hole one day.

但请注意安全问题,影响类似的方法,我在链接答案中列出了!这种方法是一种技巧,将来对文本区域允许内容(或特定浏览器中的bug)的更改可能会导致代码突然出现XSS漏洞。

#4


23  

var htmlEnDeCode = (function() {
    var charToEntityRegex,
        entityToCharRegex,
        charToEntity,
        entityToChar;

    function resetCharacterEntities() {
        charToEntity = {};
        entityToChar = {};
        // add the default set
        addCharacterEntities({
            '&amp;'     :   '&',
            '&gt;'      :   '>',
            '&lt;'      :   '<',
            '&quot;'    :   '"',
            '&#39;'     :   "'"
        });
    }

    function addCharacterEntities(newEntities) {
        var charKeys = [],
            entityKeys = [],
            key, echar;
        for (key in newEntities) {
            echar = newEntities[key];
            entityToChar[key] = echar;
            charToEntity[echar] = key;
            charKeys.push(echar);
            entityKeys.push(key);
        }
        charToEntityRegex = new RegExp('(' + charKeys.join('|') + ')', 'g');
        entityToCharRegex = new RegExp('(' + entityKeys.join('|') + '|&#[0-9]{1,5};' + ')', 'g');
    }

    function htmlEncode(value){
        var htmlEncodeReplaceFn = function(match, capture) {
            return charToEntity[capture];
        };

        return (!value) ? value : String(value).replace(charToEntityRegex, htmlEncodeReplaceFn);
    }

    function htmlDecode(value) {
        var htmlDecodeReplaceFn = function(match, capture) {
            return (capture in entityToChar) ? entityToChar[capture] : String.fromCharCode(parseInt(capture.substr(2), 10));
        };

        return (!value) ? value : String(value).replace(entityToCharRegex, htmlDecodeReplaceFn);
    }

    resetCharacterEntities();

    return {
        htmlEncode: htmlEncode,
        htmlDecode: htmlDecode
    };
})();

This is from ExtJS source code.

这是ExtJS源代码。

#5


12  

element.innerText also does the trick.

元素。innerText也起到这个作用。

#6


4  

First create a <span id="decodeIt" style="display:none;"></span> somewhere in the body

首先创建一个在体内某处

Next, assign the string to be decoded as innerHTML to this:

接下来,将要解码为innerHTML的字符串赋值为:

document.getElementById("decodeIt").innerHTML=stringtodecode

Finally,

最后,

stringtodecode=document.getElementById("decodeIt").innerText

Here is the overall code:

以下是总体代码:

var stringtodecode="<B>Hello</B> world<br>";
document.getElementById("decodeIt").innerHTML=stringtodecode;
stringtodecode=document.getElementById("decodeIt").innerText

#7


3  

jQuery will encode and decode for you. However, you need to use a textarea tag, not a div.

jQuery将为您编码和解码。但是,您需要使用textarea标记,而不是div。

var str1 = 'One & two & three';
var str2 = "One &amp; two &amp; three";
  
$(document).ready(function() {
   $("#encoded").text(htmlEncode(str1)); 
   $("#decoded").text(htmlDecode(str2));
});

function htmlDecode(value) {
  return $("<textarea/>").html(value).text();
}

function htmlEncode(value) {
  return $('<textarea/>').text(value).html();
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>

<div id="encoded"></div>
<div id="decoded"></div>

#8


1  

For one-line guys:

一行人:

const htmlDecode = innerHTML => Object.assign(document.createElement('textarea'), {innerHTML}).value;

console.log(htmlDecode('Complicated - Dimitri Vegas &amp; Like Mike'));

#9


1  

In case you're looking for it, like me - meanwhile there's a nice and safe JQuery method.

如果你想找的话,像我一样——同时还有一个安全的JQuery方法。

https://api.jquery.com/jquery.parsehtml/

https://api.jquery.com/jquery.parsehtml/

You can f.ex. type this in your console:

你可以f.ex。在您的控制台中输入:

var x = "test &amp;";
> undefined
$.parseHTML(x)[0].textContent
> "test &"

So $.parseHTML(x) returns an array, and if you have HTML markup within your text, the array.length will be greater than 1.

所以$. parsehtml (x)返回一个数组,如果文本中有HTML标记,则返回数组。长度将大于1。

#10


0  

a javascript solution that catches the common ones:

一个javascript解决方案,它捕获了常见的一个:

var map = {amp: '&', lt: '<', gt: '>', quot: '"', '#039': "'"}
str = str.replace(/&([^;]+);/g, (m, c) => map[c])

this is the reverse of https://*.com/a/4835406/2738039

这与https://*.com/a/4835406/2738039相反

#11


0  

You can use Lodash unescape / escape function https://lodash.com/docs/4.17.5#unescape

您可以使用Lodash unescape / escape函数https://lodash.com/docs/4.17.5#unescape

import unescape from 'lodash/unescape';

const str = unescape('fred, barney, &amp; pebbles');

str will become 'fred, barney, & pebbles'

str将变成“fred, barney, & pebbles”