XML、HTML和XHTML文档的有效内容类型

时间:2021-08-21 22:30:03

What are the correct content-types for XML, HTML and XHTML documents?

XML、HTML和XHTML文档的正确内容类型是什么?

I need to write a simple crawler that only fetches these kinds of files.

我需要编写一个只获取这些文件的简单爬虫。

Nowadays http://example.net/index.html can serve for example a JPEG file due to mod_rewrite, so I need to check the content-type from response header and compare it with a list of allowed content-types.

现在http://example.net/index.html可以作为一个JPEG文件,因为mod_rewrite,所以我需要检查响应头中的内容类型,并将它与允许的内容类型列表进行比较。

Where can I get such a list from?

我从哪里可以得到这样一份清单?

1 个解决方案

#1


145  

HTML: text/html, full-stop.

HTML:text / HTML,用句号。

XHTML: application/xhtml+xml, or only if following HTML compatbility guidelines, text/html. See the W3 Media Types Note.

XHTML: application/ XHTML +xml,或者只有遵循HTML可压缩准则的文本/ HTML。参见W3媒体类型说明。

XML: text/xml, application/xml (RFC 2376).

XML:文本/ XML、应用程序/ XML (RFC 2376)。

There are also many other media types based around XML, for example application/rss+xml or image/svg+xml. It's a safe bet that any unrecognised but registered ending in +xml is XML-based. See the IANA list for registered media types ending in +xml.

还有许多基于XML的其他媒体类型,例如应用程序/rss+ XML或图像/svg+ XML。可以肯定的是,任何以+xml结尾的未识别但注册的xml都是基于xml的。请参阅以+xml结尾的已注册媒体类型的IANA列表。

(For unregistered x- types, all bets are off, but you'd hope +xml would be respected.)

(对于未注册的x类型,所有的赌注都是关闭的,但是您希望+xml会得到尊重。)

#1


145  

HTML: text/html, full-stop.

HTML:text / HTML,用句号。

XHTML: application/xhtml+xml, or only if following HTML compatbility guidelines, text/html. See the W3 Media Types Note.

XHTML: application/ XHTML +xml,或者只有遵循HTML可压缩准则的文本/ HTML。参见W3媒体类型说明。

XML: text/xml, application/xml (RFC 2376).

XML:文本/ XML、应用程序/ XML (RFC 2376)。

There are also many other media types based around XML, for example application/rss+xml or image/svg+xml. It's a safe bet that any unrecognised but registered ending in +xml is XML-based. See the IANA list for registered media types ending in +xml.

还有许多基于XML的其他媒体类型,例如应用程序/rss+ XML或图像/svg+ XML。可以肯定的是,任何以+xml结尾的未识别但注册的xml都是基于xml的。请参阅以+xml结尾的已注册媒体类型的IANA列表。

(For unregistered x- types, all bets are off, but you'd hope +xml would be respected.)

(对于未注册的x类型,所有的赌注都是关闭的,但是您希望+xml会得到尊重。)