在使用Javascript将其添加到DOM之前清理用户输入

I'm writing the JS for a chat application I'm working on in my free time, and I need to have HTML identifiers that change according to user submitted data. This is usually something conceptually shaky enough that I would not even attempt it, but I don't see myself having much of a choice this time. What I need to do then is to escape the HTML id to make sure it won't allow for XSS or breaking HTML.

我正在为我在空闲时间工作的聊天应用程序编写JS,我需要根据用户提交的数据更改HTML标识符。这通常是概念上不稳定的东西,我甚至不会尝试它,但这次我不认为自己有很多选择。我需要做的是转义HTML id以确保它不允许XSS或破坏HTML。

Here's the code:

这是代码:

var user_id = escape(id)
var txt = '<div class="chut">'+
            '<div class="log" id="chut_'+user_id+'"></div>'+
            '<textarea id="chut_'+user_id+'_msg"></textarea>'+
            '<label for="chut_'+user_id+'_to">To:</label>'+
            '<input type="text" id="chut_'+user_id+'_to" value='+user_id+' readonly="readonly" />'+
            '<input type="submit" id="chut_'+user_id+'_send" value="Message"/>'+
          '</div>';

What would be the best way to escape id to avoid any kind of problem mentioned above? As you can see, right now I'm using the built-in escape() function, but I'm not sure of how good this is supposed to be compared to other alternatives. I'm mostly used to sanitizing input before it goes in a text node, not an id itself.

什么是逃避id以避免上述任何问题的最佳方法?正如你所看到的,现在我正在使用内置的escape()函数,但我不确定这与其他替代品相比有多好。我主要习惯在输入文本节点之前清理输入,而不是id本身。

6 个解决方案

#1

Never use escape(). It's nothing to do with HTML-encoding. It's more like URL-encoding, but it's not even properly that. It's a bizarre non-standard encoding available only in JavaScript.

永远不要使用escape()。这与HTML编码无关。它更像是URL编码,但它甚至都不正确。这是一种奇怪的非标准编码,只能在JavaScript中使用。

If you want an HTML encoder, you'll have to write it yourself as JavaScript doesn't give you one. For example:

如果你想要一个HTML编码器,你必须自己编写,因为JavaScript没有给你一个。例如:

function encodeHTML(s) {
    return s.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/"/g, '&quot;');
}

However whilst this is enough to put your user_id in places like the input value, it's not enough for id because IDs can only use a limited selection of characters. (And % isn't among them, so escape() or even encodeURIComponent() is no good.)

然而,虽然这足以将您的user_id放在像输入值这样的位置,但它对于id来说还不够,因为ID只能使用有限的字符选择。 (并且%不在其中,所以escape()甚至encodeURIComponent()都不好。)

You could invent your own encoding scheme to put any characters in an ID, for example:

您可以创建自己的编码方案,将任何字符放入ID中,例如:

function encodeID(s) {
    if (s==='') return '_';
    return s.replace(/[^a-zA-Z0-9.-]/g, function(match) {
        return '_'+match[0].charCodeAt(0).toString(16)+'_';
    });
}

But you've still got a problem if the same user_id occurs twice. And to be honest, the whole thing with throwing around HTML strings is usually a bad idea. Use DOM methods instead, and retain JavaScript references to each element, so you don't have to keep calling getElementById, or worrying about how arbitrary strings are inserted into IDs.

但是如果同一个user_id出现两次,你仍然会遇到问题。说实话,抛弃HTML字符串的整个过程通常都是个坏主意。请改用DOM方法,并保留对每个元素的JavaScript引用,这样您就不必继续调用getElementById,或者担心如何将任意字符串插入到ID中。

eg.:

function addChut(user_id) {
    var log= document.createElement('div');
    log.className= 'log';
    var textarea= document.createElement('textarea');
    var input= document.createElement('input');
    input.value= user_id;
    input.readonly= True;
    var button= document.createElement('input');
    button.type= 'button';
    button.value= 'Message';

    var chut= document.createElement('div');
    chut.className= 'chut';
    chut.appendChild(log);
    chut.appendChild(textarea);
    chut.appendChild(input);
    chut.appendChild(button);
    document.getElementById('chuts').appendChild(chut);

    button.onclick= function() {
        alert('Send '+textarea.value+' to '+user_id);
    };

    return chut;
}

You could also use a convenience function or JS framework to cut down on the lengthiness of the create-set-appends calls there.

您还可以使用便捷函数或JS框架来减少create-set-appends调用的长度。

ETA:

I'm using jQuery at the moment as a framework

我现在正在使用jQuery作为框架

OK, then consider the jQuery 1.4 creation shortcuts, eg.:

好的,然后考虑jQuery 1.4创建快捷方式,例如:

var log= $('<div>', {className: 'log'});
var input= $('<input>', {readOnly: true, val: user_id});
...

The problem I have right now is that I use JSONP to add elements and events to a page, and so I can not know whether the elements already exist or not before showing a message.

我现在遇到的问题是我使用JSONP向页面添加元素和事件,因此在显示消息之前我无法知道元素是否已经存在。

You can keep a lookup of user_id to element nodes (or wrapper objects) in JavaScript, to save putting that information in the DOM itself, where the characters that can go in an id are restricted.

您可以在JavaScript中查找user_id到元素节点(或包装器对象),以便将该信息保存在DOM本身中,其中可以进入id的字符受到限制。

var chut_lookup= {};
...

function getChut(user_id) {
    var key= '_map_'+user_id;
    if (key in chut_lookup)
        return chut_lookup[key];
    return chut_lookup[key]= addChut(user_id);
}

(The _map_ prefix is because JavaScript objects don't quite work as a mapping of arbitrary strings. The empty string and, in IE, some Object member names, confuse it.)

(_map_前缀是因为JavaScript对象不能完全作为任意字符串的映射。空字符串和IE中的一些Object成员名称会混淆它。)

#2

Another approach that I like is to use the native DOM capabilities: http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript

我喜欢的另一种方法是使用原生DOM功能:http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript

#3

You could use a simple regular expression to assert that the id only contains allowed characters, like so:

您可以使用简单的正则表达式断言id仅包含允许的字符,如下所示:

if(id.match(/^[0-9a-zA-Z]{1,16}$/)){
    //The id is fine
}
else{
    //The id is illegal
}

My example allows only alphanumerical characters, and strings of length 1 to 16, you should change it to match the type of ids that you use.

我的示例仅允许使用字母数字字符和长度为1到16的字符串,您应该更改它以匹配您使用的ID类型。

By the way, at line 6, the value property is missing a pair of quotes, an easy mistake to make when you quote on two levels.

顺便说一句,在第6行,value属性缺少一对引号,当你引用两个级别时,这是一个容易犯的错误。

I can't see your actual data flow, depending on context this check may not at all be needed, or it may not be enough. In order to make a proper security review we would need more information.

我无法看到您的实际数据流,具体取决于上下文,可能根本不需要此检查,或者可能还不够。为了进行适当的安全审查,我们需要更多信息。

In general, about built in escape or sanitize functions, don't trust them blindly. You need to know exactly what they do, and you need to establish that that is actually what you need. If it is not what you need, the code your own, most of the time a simple whitelisting regex like the one I gave you works just fine.

一般来说,关于内置的逃生或消毒功能,不要盲目相信它们。你需要确切地知道他们做了什么,你需要确定那实际上是你需要的。如果它不是你需要的,你自己的代码,大多数时候像我给你的那个简单的白名单正则表达式工作得很好。

#4

You may also use this:

你也可以用这个:

function sanitarize(string) {
  const map = {
      '&': '&amp;',
      '<': '&lt;',
      '>': '&gt;',
      '"': '&quot;',
      "'": '&#x27;',
      "/": '&#x2F;',
  };
  const reg = /[&<>"'/]/ig;
  return string.replace(reg, (match)=>(map[match]));
}

OWASP documentation suggest maping: https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet

OWASP文档建议maping:https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Revention_Cheat_Sheet

#5

Since the text that you are escaping will appear in an HTML attribute, you must be sure to escape not only HTML entities but also HTML attributes:

由于您要转义的文本将出现在HTML属性中,因此您必须确保不仅要转义HTML实体,还要转义HTML属性:

var ESC_MAP = {
    '&': '&amp;',
    '<': '&lt;',
    '>': '&gt;',
    '"': '&quot;',
    "'": '&#39;'
};

function escapeHTML(s, forAttribute) {
    return s.replace(forAttribute ? /[&<>'"]/g : /[&<>]/g, function(c) {
        return ESC_MAP[c];
    });
}

Then, your escaping code becomes var user_id = escapeHTML(id, true).

然后,您的转义代码变为var user_id = escapeHTML(id,true)。

For more information, see Foolproof HTML escaping in Javascript.

有关更多信息,请参阅Javascript中的Foolproof HTML转义。

#6

You need to take extra precautions when using user supplied data in HTML attributes. Because attributes has many more attack vectors than output inside HTML tags.

在HTML属性中使用用户提供的数据时,您需要采取额外的预防措施。因为属性比HTML标记内的输出具有更多的攻击向量。

The only way to avoid XSS attacks is to encode everything except alphanumeric characters. Escape all characters with ASCII values less than 256 with the &#xHH; format. Which unfortunately may cause problems in your scenario, if you are using CSS classes and javascript to fetch those elements.

避免XSS攻击的唯一方法是编码除字母数字字符以外的所有内容。使用&#xHH转义ASCII值小于256的所有字符;格式。遗憾的是,如果您使用CSS类和javascript来获取这些元素,那么您的场景可能会出现问题。

OWASP has a good description of how to mitigate HTML attribute XSS:

OWASP很好地描述了如何缓解HTML属性XSS:

http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.233_-_JavaScript_Escape_Before_Inserting_Untrusted_Data_into_HTML_JavaScript_Data_Values

#1