结束标记是否应该用省略的结束标记关闭所有未关闭的中间开始标记?

时间:2022-11-26 13:40:09

Am I reading the HTML 4.01 standard wrong, or is Google? In HTML 4.01, if I write:

是我读错了HTML 4.01标准,还是谷歌?在HTML 4.01中,如果我写:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">
<html> <head> <body>plain <em>+em <strong>+strong </em>-em

The rendering in Google Chrome is:

谷歌Chrome渲染为:

plain +em +strong -em

普通+ em +强大的-嗯

This seems to contradict the HTML 4.01 standard, which summarizes the underlying SGML rules as: “an end tag closes, back to the matching start tag, all unclosed intervening start tags with omitted end tags”.¹

这似乎矛盾的HTML 4.01标准,总结了底层SGML规则:“结束标记关闭,返回匹配的开始标记,所有未结束的干预开始标签与省略结束标签”。¹

That is, the </em> end tag should close not only the <em> start tag but also the unclosed intervening <strong> start tag, and the rendering should be:

即结束标签不仅要关闭开始标签,还要关闭未关闭的中介开始标签,呈现为:

plain +em +strong -em

普通+ em +强大的-嗯

A commenter pointed out that it is bad practice to leave tags open, but this is only an academic example. An equally good example would be: <em> +em <strong> +strong </em> -em </strong>. It was my understanding from the HTML 4.01 standard that this code fragment would not work as intended because of the overlapping elements: the </em> end tag should implicitly close the <strong>. The fact that it did work as intended was surprising, and this is what led to my question.

一位评论者指出,让标签打开是不好的做法,但这只是一个学术上的例子。一个同样好的例子是: +em +strong -em 。根据HTML 4.01标准的理解,由于重叠的元素,这个代码片段不能正常工作:结束标记应该隐式地关闭。它确实如预期的那样发挥了作用,这是我的问题的原因。

And it turned out I proposed a false dichotomy in the question: neither Google nor I were reading the HTML 4.01 standard wrong. A private correspondent at w3.org pointed me to Web SGML and HTML 4.0 Explained by Martin Bryan, which explains that “[t]he parsing program will automatically close any currently open embedded element which has been declared as having omissible end-tags when it encounters an end-tag for a higher level element. (If an embedded element whose end-tag cannot be omitted is still open, however, the program will report an error in the coding.)”² (Emphasis added.) Bryan’s summarization of the SGML standard is right, and HTML 4.01’s summarization is wrong.

结果我提出了一个错误的二分法:谷歌和我都没有读错HTML 4.01标准。w3.org的一位私人记者向我介绍了马丁·布莱恩(Martin Bryan)解释的Web SGML和HTML 4.0,他解释说:“解析程序将自动关闭任何当前开放的嵌入式元素,当遇到高级元素的结束标记时,这些元素被声明为具有可省略的结束标记。”(如果嵌入式元素的结束标记不能省略仍然开放,然而,该计划将报告一个错误编码。)“²(重点)。Bryan对SGML标准的总结是正确的,HTML 4.01的总结是错误的。

6 个解决方案

#1


4  

The statement quoted from the HTML 4.01 specification is very obscure, or just plain wrong on all accounts. HTML 4.01 has specific rules for end tag omission, and these rules depend on the element. For example, the end tag of a p element may be omitted, the end tag of an em may never be omitted. The statement in the specification probably tries to say that an end tag implicitly closes any inner elements that have not yet been closed, to the extent that end tag omission is allowed.

HTML 4.01规范中引用的语句是非常模糊的,或者在所有情况下都是完全错误的。HTML 4.01对于结束标记省略有特定的规则,这些规则依赖于元素。例如,可以省略p元素的结束标记,永远不会省略em的结束标记。规范中的语句可能试图说明,在允许结束标记省略的情况下,结束标记隐式地关闭尚未关闭的任何内部元素。

No browser has ever implement HTML 4.01 (or any earlier HTML specification) as defined, with the SGML features that are formally part of it. Anything that the HTML specifications say about SGML should be taken as just theoretical until proven otherwise.

没有任何浏览器实现过定义的HTML 4.01(或任何早期的HTML规范),其中包含正式的SGML特性。任何HTML规范说的关于SGML的东西都应该被当作是理论上的,除非有其他的证明。

HTML5 doesn’t change the rules of the game in this respect, except that it writes down the error handling rules. In simple issues like these, the rules just make the traditional browser behavior a norm. They are tagsoup-oriented, treating tags more or less as formatting commands: <em> means “italicize,” </em> means “stop italicizing,” etc. But HTML5 also takes measures to define error handling more formally so that despite such tag soup usage, it is well-defined what document tree in the DOM will be constructed.

HTML5在这方面并没有改变游戏规则,只是它写下了错误处理规则。在这些简单的问题中,这些规则只是让传统的浏览器行为成为一种规范。它们是面向标签的,将标签或多或少地当作格式化命令处理:表示“斜体化”,表示“停止斜体化”,等等。但是HTML5也采取措施,更正式地定义错误处理,以便尽管使用了这些标签汤,仍然可以很好地定义DOM中的文档树。

#2


6  

Some tags are allowed to be omitted (such as the end tag for <p> or the start and end tags for <body>), and some are not (such as the end tag for <strong>). It is the former that the section of the spec you quote is referring to. You can identify them by the use of a dash in the DTD:

有些标签可以省略(比如

的结束标签或者的开始和结束标签),有些则不可以(比如的结束标签)。你所引用的说明书中提到的正是前者。可以在DTD中使用破折号来标识它们:

<!ELEMENT P - O (%inline;)*            -- paragraph -->
  ^A p element
            ^ requires a start tag
              ^ has optional end tag
                 ^ contains zero or more inline things
                                       ^ Comment: Is a paragraph

What you have is not an HTML document with an omitted tag, but and invalid pseudo-HTML document that browsers will try to perform error recovery on.

您所拥有的不是带有省略标记的HTML文档,而是浏览器试图在其上执行错误恢复的无效伪HTML文档。

The specification (for HTML 4) does not describe how to perform error recovery, that is left up to browsers.

规范(HTML 4)没有描述如何执行错误恢复,这由浏览器决定。

#3


1  

The specification says that:

规范说:

Some HTML element types allow authors to omit end tags (e.g., the P and LI element types).

一些HTML元素类型允许作者省略结束标记(例如,P和LI元素类型)。

This:

这样的:

Please consult the SGML standard for information about rules governing elements (e.g., they must be properly nested, an end tag closes, back to the matching start tag, all unclosed intervening start tags with omitted end tags (section 7.5.1), etc.).

有关规则管理元素的信息,请参考SGML标准(例如,它们必须正确嵌套,结束标记关闭,返回到匹配的开始标记,所有未关闭的中间开始标记都带有省略的结束标记(第7.5.1节),等等)。

Applies to elements which can have omitted end tags.

应用于可以省略结束标记的元素。

If you look the P element spec you will see:

如果你看P元素规格,你会看到:

Start tag: required, End tag: optional

开始标签:必需,结束标签:可选

So, when you use this:

所以,当你使用这个

<DIV>
<P>This is the paragraph.
</DIV>

The P element will be automatically closed.

P元素将自动关闭。

But, if you look at the EM spec, you will see:

但是,如果你看一下EM规范,你会看到:

Start tag: required, End tag: required

起始标签:required,结束标签:required

So this rule of automatic closing is not valid since the HTML is not valid.

因此自动关闭规则无效,因为HTML无效。

Curiously all the browsers presented the same behavior with that kind of invalid HTML.

奇怪的是,所有的浏览器都呈现了与这种无效HTML相同的行为。

#4


1  

All modern browsers use an HTML5 parser (even for HTML 4.01 content), so the parsing rules of HTML5 apply. You can find more information at the Parsing HTML Documents section in the HTML5 spec.

所有现代浏览器都使用HTML5解析器(即使是HTML 4.01内容),因此HTML5的解析规则也适用。您可以在HTML5规范的解析HTML文档部分找到更多信息。

HTML Outline

HTML大纲

  • HTML
    • HEAD
      • #text " " ()
      • #文本”()
    • 头#文字“”()
    • BODY
      • #text "plain " ()
      • #文本“平原”()
      • EM
        • #text "+em " (italic)
        • #文本”+ em”(斜体)
        • STRONG
          • #text "+strong " (bold/italic)
          • #文本”+强”(粗体或斜体)
        • 强#文本“+ STRONG”(粗体/斜体)
      • 文字"+ EM "(斜体)强#文字"+强"(粗体/斜体)
      • STRONG
        • #text "-em" (bold)
        • #文本”-嗯”(粗体)
      • 强大的#文本”-嗯”(粗体显示)
    • 正文#文本"plain " () EM #text "+ EM "(斜体)强#text "+ STRONG "(粗体/斜体)强#text "-em"(粗体)
  • HTML头#文本" "()主体#文本"plain " () EM #文本"+ EM "(斜体)强#文本"+ STRONG "(粗体/斜体)强#文本"-em"(粗体)

#5


0  

If you try running your HTML through http://validator.w3.org/check it will flag up this HTML as being pretty much invalid.

如果您尝试通过http://validator.w3.org/check运行HTML,它会将这个HTML标记为几乎无效。

If your HTML is invalid, all bets are off, and different browsers may render your HTML differently.

如果你的HTML是无效的,所有的赌注都是无效的,不同的浏览器可能会以不同的方式呈现你的HTML。

#6


0  

If you look at the D.O.M. in Chrome by right clicking and saying inspect element, you'll be able to deduce that since your tags do not match up, it applied an algorithm to decide where you messed up. Technically, it does close the strong tag at the correct place. However, It decides that you were probably trying to make both pieces of text bold, so it puts the last -em in an entirely new, extra "strong" element while keeping the '+strong' in it's own "strong" element. It looks to me like the chrome team decided it is statistically likely that you want both things to be bold.

如果你通过右键点击并输入inspect元素来查看Chrome中的D.O.M.,你将能够推断出,由于您的标签不匹配,它应用了一个算法来决定您的错误。从技术上讲,它确实在正确的地方关闭了强大的标签。然而,它决定您可能试图使这两个文本片段都加粗,因此它将最后一个-em放在一个全新的、额外的“强”元素中,同时在它自己的“强”元素中保留“+strong”。在我看来,chrome团队在统计数据上似乎认为,你希望两件事都大胆一些。

#1


4  

The statement quoted from the HTML 4.01 specification is very obscure, or just plain wrong on all accounts. HTML 4.01 has specific rules for end tag omission, and these rules depend on the element. For example, the end tag of a p element may be omitted, the end tag of an em may never be omitted. The statement in the specification probably tries to say that an end tag implicitly closes any inner elements that have not yet been closed, to the extent that end tag omission is allowed.

HTML 4.01规范中引用的语句是非常模糊的,或者在所有情况下都是完全错误的。HTML 4.01对于结束标记省略有特定的规则,这些规则依赖于元素。例如,可以省略p元素的结束标记,永远不会省略em的结束标记。规范中的语句可能试图说明,在允许结束标记省略的情况下,结束标记隐式地关闭尚未关闭的任何内部元素。

No browser has ever implement HTML 4.01 (or any earlier HTML specification) as defined, with the SGML features that are formally part of it. Anything that the HTML specifications say about SGML should be taken as just theoretical until proven otherwise.

没有任何浏览器实现过定义的HTML 4.01(或任何早期的HTML规范),其中包含正式的SGML特性。任何HTML规范说的关于SGML的东西都应该被当作是理论上的,除非有其他的证明。

HTML5 doesn’t change the rules of the game in this respect, except that it writes down the error handling rules. In simple issues like these, the rules just make the traditional browser behavior a norm. They are tagsoup-oriented, treating tags more or less as formatting commands: <em> means “italicize,” </em> means “stop italicizing,” etc. But HTML5 also takes measures to define error handling more formally so that despite such tag soup usage, it is well-defined what document tree in the DOM will be constructed.

HTML5在这方面并没有改变游戏规则,只是它写下了错误处理规则。在这些简单的问题中,这些规则只是让传统的浏览器行为成为一种规范。它们是面向标签的,将标签或多或少地当作格式化命令处理:表示“斜体化”,表示“停止斜体化”,等等。但是HTML5也采取措施,更正式地定义错误处理,以便尽管使用了这些标签汤,仍然可以很好地定义DOM中的文档树。

#2


6  

Some tags are allowed to be omitted (such as the end tag for <p> or the start and end tags for <body>), and some are not (such as the end tag for <strong>). It is the former that the section of the spec you quote is referring to. You can identify them by the use of a dash in the DTD:

有些标签可以省略(比如

的结束标签或者的开始和结束标签),有些则不可以(比如的结束标签)。你所引用的说明书中提到的正是前者。可以在DTD中使用破折号来标识它们:

<!ELEMENT P - O (%inline;)*            -- paragraph -->
  ^A p element
            ^ requires a start tag
              ^ has optional end tag
                 ^ contains zero or more inline things
                                       ^ Comment: Is a paragraph

What you have is not an HTML document with an omitted tag, but and invalid pseudo-HTML document that browsers will try to perform error recovery on.

您所拥有的不是带有省略标记的HTML文档,而是浏览器试图在其上执行错误恢复的无效伪HTML文档。

The specification (for HTML 4) does not describe how to perform error recovery, that is left up to browsers.

规范(HTML 4)没有描述如何执行错误恢复,这由浏览器决定。

#3


1  

The specification says that:

规范说:

Some HTML element types allow authors to omit end tags (e.g., the P and LI element types).

一些HTML元素类型允许作者省略结束标记(例如,P和LI元素类型)。

This:

这样的:

Please consult the SGML standard for information about rules governing elements (e.g., they must be properly nested, an end tag closes, back to the matching start tag, all unclosed intervening start tags with omitted end tags (section 7.5.1), etc.).

有关规则管理元素的信息,请参考SGML标准(例如,它们必须正确嵌套,结束标记关闭,返回到匹配的开始标记,所有未关闭的中间开始标记都带有省略的结束标记(第7.5.1节),等等)。

Applies to elements which can have omitted end tags.

应用于可以省略结束标记的元素。

If you look the P element spec you will see:

如果你看P元素规格,你会看到:

Start tag: required, End tag: optional

开始标签:必需,结束标签:可选

So, when you use this:

所以,当你使用这个

<DIV>
<P>This is the paragraph.
</DIV>

The P element will be automatically closed.

P元素将自动关闭。

But, if you look at the EM spec, you will see:

但是,如果你看一下EM规范,你会看到:

Start tag: required, End tag: required

起始标签:required,结束标签:required

So this rule of automatic closing is not valid since the HTML is not valid.

因此自动关闭规则无效,因为HTML无效。

Curiously all the browsers presented the same behavior with that kind of invalid HTML.

奇怪的是,所有的浏览器都呈现了与这种无效HTML相同的行为。

#4


1  

All modern browsers use an HTML5 parser (even for HTML 4.01 content), so the parsing rules of HTML5 apply. You can find more information at the Parsing HTML Documents section in the HTML5 spec.

所有现代浏览器都使用HTML5解析器(即使是HTML 4.01内容),因此HTML5的解析规则也适用。您可以在HTML5规范的解析HTML文档部分找到更多信息。

HTML Outline

HTML大纲

  • HTML
    • HEAD
      • #text " " ()
      • #文本”()
    • 头#文字“”()
    • BODY
      • #text "plain " ()
      • #文本“平原”()
      • EM
        • #text "+em " (italic)
        • #文本”+ em”(斜体)
        • STRONG
          • #text "+strong " (bold/italic)
          • #文本”+强”(粗体或斜体)
        • 强#文本“+ STRONG”(粗体/斜体)
      • 文字"+ EM "(斜体)强#文字"+强"(粗体/斜体)
      • STRONG
        • #text "-em" (bold)
        • #文本”-嗯”(粗体)
      • 强大的#文本”-嗯”(粗体显示)
    • 正文#文本"plain " () EM #text "+ EM "(斜体)强#text "+ STRONG "(粗体/斜体)强#text "-em"(粗体)
  • HTML头#文本" "()主体#文本"plain " () EM #文本"+ EM "(斜体)强#文本"+ STRONG "(粗体/斜体)强#文本"-em"(粗体)

#5


0  

If you try running your HTML through http://validator.w3.org/check it will flag up this HTML as being pretty much invalid.

如果您尝试通过http://validator.w3.org/check运行HTML,它会将这个HTML标记为几乎无效。

If your HTML is invalid, all bets are off, and different browsers may render your HTML differently.

如果你的HTML是无效的,所有的赌注都是无效的,不同的浏览器可能会以不同的方式呈现你的HTML。

#6


0  

If you look at the D.O.M. in Chrome by right clicking and saying inspect element, you'll be able to deduce that since your tags do not match up, it applied an algorithm to decide where you messed up. Technically, it does close the strong tag at the correct place. However, It decides that you were probably trying to make both pieces of text bold, so it puts the last -em in an entirely new, extra "strong" element while keeping the '+strong' in it's own "strong" element. It looks to me like the chrome team decided it is statistically likely that you want both things to be bold.

如果你通过右键点击并输入inspect元素来查看Chrome中的D.O.M.,你将能够推断出,由于您的标签不匹配,它应用了一个算法来决定您的错误。从技术上讲,它确实在正确的地方关闭了强大的标签。然而,它决定您可能试图使这两个文本片段都加粗,因此它将最后一个-em放在一个全新的、额外的“强”元素中,同时在它自己的“强”元素中保留“+strong”。在我看来,chrome团队在统计数据上似乎认为,你希望两件事都大胆一些。