在将HTML插入到数据库中而不是在输出中,这是一个坏主意吗?

时间:2022-02-15 07:37:26

I've been working on a system which doesn't allow HTML formatting. The method I currently use is to escape HTML entities before they get inserted into the database. I've been told that I should insert the raw text into the database, and escape HTML entities on output.

我一直在研究一个不允许HTML格式的系统。我目前使用的方法是在HTML实体插入到数据库之前转义它们。我被告知应该将原始文本插入数据库,并在输出时转义HTML实体。

Other similar questions here I've seen look like for cases where HTML can still be used for formatting, so I'm asking for a case where HTML wouldn't be used at all.

我在这里看到的其他类似的问题类似于HTML仍然可以用于格式化的情况,所以我要问的是HTML根本不会被使用的情况。

4 个解决方案

#1


14  

you will also restrict yourself when performing the escaping before inserting into your db. let's say you decide to not use HTML as output, but JSON, plaintext, etc.

在插入到db之前,您还将限制自己执行转义操作。假设您决定不使用HTML作为输出,而是使用JSON、明文等。

if you have stored escaped html in your db, you would first have to 'unescape' the value stored in the db, just to re-escape it again into a different format.

如果您在db中存储了转义html,则首先必须“释放”存储在db中的值,以便重新将其重新转义为不同的格式。

also see this perfect owasp article on xss prevention

也可以看看这篇关于xss预防的完美owasp文章

#2


17  

Yes, because at some stage you'll want access to the original input entered. This is because...

是的,因为在某些阶段,您将希望访问输入的原始输入。这是因为……

  • You never know how you want to display it - in JSON, in HTML, as an SMS?
  • 你永远不知道如何显示它——JSON, HTML, SMS?
  • You may need to show it back to the user as is.
  • 您可能需要将其显示给用户。

I do see your point about never wanting HTML entered. What are you using to strip HTML tags? If it a regex, then look out for confused users who might type something like this...

我明白你的意思,不要输入HTML。你用什么来去除HTML标签?如果它是一个regex,那么请注意可能输入类似以下内容的混乱用户……

3<4 :->

They'll only get the 3 if it is a regex.

如果是正则表达式,他们只能得到3。

#3


4  

  1. Another elusive issue: Suppose you are entering a record with the string R&B in it's title. It will be stored as R&amp;B. And assume we have a search function which uses the SQL:

    另一个难以捉摸的问题是:假设您正在输入一个带有string R&B标题的记录。它将被存储为R&B。假设我们有一个使用SQL的搜索函数:

    $query = $database->prepare('SELECT * FROM table WHERE title LIKE ?');
    $query->execute(array($searchString.'%'));    
    

    Now if someone searches R&B, it won't match this row, as it is stored as R&amp;B. The situation is the same for equality, sorting etc.

    如果有人搜索R&B,它将不匹配这一行,因为它存储为R&B。平等、分类等情况也是一样的。

    Of course, here we have the issue of not searching HTML tags, as <span>'s will be matching when someone searches for span. This could be solved by delegating the search functionality to some external service like Solr, or by storing a version in a second field which is cleared of HTML tags, special characters and such (for full text search) similar to what @limscoder suggested.

    当然,这里我们有不搜索HTML标签的问题,因为当有人搜索span时's将会匹配。可以通过将搜索功能委托给Solr之类的外部服务来解决这个问题,或者将一个版本存储在第二个字段中,该字段不包含HTML标记、特殊字符等(对于全文搜索),类似于@limscoder的建议。

  2. One day you may be exposing your data via an API or something, and your API users may assume it un-escaped.

    有一天,您可能通过API或其他方式公开您的数据,您的API用户可能会认为它没有转义。

  3. A few months later, a new team member joins. As a well trained developer, he always uses html escaping, now only to see everything is double-escaped (e.g. there are titles showing up like He said &quot;nuff&quot; instead of He said "nuff").

    几个月后,一个新的团队成员加入了。作为一名训练有素的开发人员,他总是使用html转义,现在只看到所有东西都是双转义的(例如,出现了一些标题,就像他说的“nuff;”;而不是他说的“nuff”。

  4. Quote style of htmlspecialchars() (e.g. ENT_QUOTES, ENT_COMPAT etc) is going to bite you, if you are using anything other than the default one and forget to use the same quoting style in both storing/outputting.

    htmlspecialchars()的引号样式(例如ENT_QUOTES、ENT_COMPAT等)会让您吃不消,如果您正在使用除默认值之外的任何东西,而忘记在存储/输出中使用相同的引号样式的话。

    A similar issue happens when you use htmlentities() to store, and htmlspecialchars() to output, or vice versa (with corresponding counter-functions). Your HTML will be polluted with &Uuml;s, &Ccedil;s etc.

    当您使用htmlentities()存储和htmlspecialchars()输出时,也会发生类似的问题,反之亦然(具有相应的反函数)。你的HTML将会被污染。

    These are more prone to be abused if there are multiple developers working on the same codebase.

    如果有多个开发人员在同一代码基上工作,则更容易滥用这些代码。

#4


3  

I usually store both versions of the text. The escaped/formatted text is used when a normal page request is made to avoid the overhead of escaping/formatting every time. The original/raw text is used when a user needs to edit an existing entry, and the escaping/formatting only occurs when the text is created or changed. This strategy works great unless you have tight storage space constraints, since you will be duplicating data.

我通常存储两个版本的文本。当正常的页面请求被用来避免每次的转义/格式化的开销时,就会使用转义/格式化的文本。当用户需要编辑现有条目时使用原始/原始文本,转义/格式化只在创建或更改文本时发生。这种策略非常有效,除非您有严格的存储空间限制,因为您将复制数据。

#1


14  

you will also restrict yourself when performing the escaping before inserting into your db. let's say you decide to not use HTML as output, but JSON, plaintext, etc.

在插入到db之前,您还将限制自己执行转义操作。假设您决定不使用HTML作为输出,而是使用JSON、明文等。

if you have stored escaped html in your db, you would first have to 'unescape' the value stored in the db, just to re-escape it again into a different format.

如果您在db中存储了转义html,则首先必须“释放”存储在db中的值,以便重新将其重新转义为不同的格式。

also see this perfect owasp article on xss prevention

也可以看看这篇关于xss预防的完美owasp文章

#2


17  

Yes, because at some stage you'll want access to the original input entered. This is because...

是的,因为在某些阶段,您将希望访问输入的原始输入。这是因为……

  • You never know how you want to display it - in JSON, in HTML, as an SMS?
  • 你永远不知道如何显示它——JSON, HTML, SMS?
  • You may need to show it back to the user as is.
  • 您可能需要将其显示给用户。

I do see your point about never wanting HTML entered. What are you using to strip HTML tags? If it a regex, then look out for confused users who might type something like this...

我明白你的意思,不要输入HTML。你用什么来去除HTML标签?如果它是一个regex,那么请注意可能输入类似以下内容的混乱用户……

3<4 :->

They'll only get the 3 if it is a regex.

如果是正则表达式,他们只能得到3。

#3


4  

  1. Another elusive issue: Suppose you are entering a record with the string R&B in it's title. It will be stored as R&amp;B. And assume we have a search function which uses the SQL:

    另一个难以捉摸的问题是:假设您正在输入一个带有string R&B标题的记录。它将被存储为R&B。假设我们有一个使用SQL的搜索函数:

    $query = $database->prepare('SELECT * FROM table WHERE title LIKE ?');
    $query->execute(array($searchString.'%'));    
    

    Now if someone searches R&B, it won't match this row, as it is stored as R&amp;B. The situation is the same for equality, sorting etc.

    如果有人搜索R&B,它将不匹配这一行,因为它存储为R&B。平等、分类等情况也是一样的。

    Of course, here we have the issue of not searching HTML tags, as <span>'s will be matching when someone searches for span. This could be solved by delegating the search functionality to some external service like Solr, or by storing a version in a second field which is cleared of HTML tags, special characters and such (for full text search) similar to what @limscoder suggested.

    当然,这里我们有不搜索HTML标签的问题,因为当有人搜索span时's将会匹配。可以通过将搜索功能委托给Solr之类的外部服务来解决这个问题,或者将一个版本存储在第二个字段中,该字段不包含HTML标记、特殊字符等(对于全文搜索),类似于@limscoder的建议。

  2. One day you may be exposing your data via an API or something, and your API users may assume it un-escaped.

    有一天,您可能通过API或其他方式公开您的数据,您的API用户可能会认为它没有转义。

  3. A few months later, a new team member joins. As a well trained developer, he always uses html escaping, now only to see everything is double-escaped (e.g. there are titles showing up like He said &quot;nuff&quot; instead of He said "nuff").

    几个月后,一个新的团队成员加入了。作为一名训练有素的开发人员,他总是使用html转义,现在只看到所有东西都是双转义的(例如,出现了一些标题,就像他说的“nuff;”;而不是他说的“nuff”。

  4. Quote style of htmlspecialchars() (e.g. ENT_QUOTES, ENT_COMPAT etc) is going to bite you, if you are using anything other than the default one and forget to use the same quoting style in both storing/outputting.

    htmlspecialchars()的引号样式(例如ENT_QUOTES、ENT_COMPAT等)会让您吃不消,如果您正在使用除默认值之外的任何东西,而忘记在存储/输出中使用相同的引号样式的话。

    A similar issue happens when you use htmlentities() to store, and htmlspecialchars() to output, or vice versa (with corresponding counter-functions). Your HTML will be polluted with &Uuml;s, &Ccedil;s etc.

    当您使用htmlentities()存储和htmlspecialchars()输出时,也会发生类似的问题,反之亦然(具有相应的反函数)。你的HTML将会被污染。

    These are more prone to be abused if there are multiple developers working on the same codebase.

    如果有多个开发人员在同一代码基上工作,则更容易滥用这些代码。

#4


3  

I usually store both versions of the text. The escaped/formatted text is used when a normal page request is made to avoid the overhead of escaping/formatting every time. The original/raw text is used when a user needs to edit an existing entry, and the escaping/formatting only occurs when the text is created or changed. This strategy works great unless you have tight storage space constraints, since you will be duplicating data.

我通常存储两个版本的文本。当正常的页面请求被用来避免每次的转义/格式化的开销时,就会使用转义/格式化的文本。当用户需要编辑现有条目时使用原始/原始文本,转义/格式化只在创建或更改文本时发生。这种策略非常有效,除非您有严格的存储空间限制,因为您将复制数据。