如何在Firefox扩展中将HTML字符串转换为DOM对象?

时间:2022-09-02 09:32:26

I'm downloading a web page (tag soup HTML) with XMLHttpRequest and I want to take the output and turn it into a DOM object that I can then run XPATH queries on. How do I convert from a string into DOM object?

我正在使用XMLHttpRequest下载一个网页(标签汤HTML),我想获取输出并将其转换为DOM对象,然后我可以运行XPATH查询。如何从字符串转换为DOM对象?

It appears that the general solution is to create a hidden iframe and throw the contents of the string into that. There has been talk of updating DOMParser to support text/html but as of Firefox 3.0.1 you still get an NS_ERROR_NOT_IMPLEMENTED if you try.

通常的解决方案似乎是创建一个隐藏的iframe并将字符串的内容抛入其中。一直有关于更新DOMParser以支持text / html的讨论,但是从Firefox 3.0.1开始,如果你尝试,你仍会获得NS_ERROR_NOT_IMPLEMENTED。

Is there any option besides using the hidden iframe trick? And if not, what is the best way to do the iframe trick so that your code works outside the context of any currently open tabs (so that closing tabs won't screw up code, etc)?

除了使用隐藏的iframe技巧之外还有其他选择吗?如果没有,那么做iframe技巧的最佳方法是什么,以便你的代码在任何当前打开的选项卡的上下文之外工作(这样关闭选项卡不会搞砸代码等)?

This is an example of why I'm looking for a solution other than the iframe hack, if I have to write all that code to have a robust solution, then I'd rather keep looking for something else.

这是为什么我正在寻找iframe黑客以外的解决方案的一个例子,如果我必须编写所有代码以获得强大的解决方案,那么我宁愿继续寻找其他东西。

5 个解决方案

#1


7  

Ajaxian actually had a post on inserting / retrieving html from an iframe today. You can probably use the js snippet they have posted there.

Ajaxian实际上有一篇关于今天从iframe插入/检索html的帖子。您可以使用他们在那里发布的js片段。

As for handling closing of a browser / tab, you can attach to the onbeforeunload (http://msdn.microsoft.com/en-us/library/ms536907(VS.85).aspx) event and do whatever you need to do.

至于处理浏览器/选项卡的关闭,您可以附加到onbeforeunload(http://msdn.microsoft.com/en-us/library/ms536907(VS.85).aspx)事件并做任何你需要做的事情。

#2


4  

Try this:

尝试这个:

var request = new XMLHttpRequest();

request.overrideMimeType( 'text/xml' );
request.onreadystatechange = process;
request.open ( 'GET', url );
request.send( null );

function process() { 
    if ( request.readyState == 4 && request.status == 200 ) {
        var xml = request.responseXML;
    }
}

Notice the overrideMimeType and responseXML.
The readyState == 4 is 'completed'.

注意overrideMimeType和responseXML。 readyState == 4是'已完成'。

#3


1  

Try creating a div

尝试创建一个div

document.createElement( 'div' );

And then set the tag soup HTML to the innerHTML of the div. The browser should process that into XML, which then you can parse.

然后将标签汤HTML设置为div的innerHTML。浏览器应该将其处理为XML,然后您可以解析。

The innerHTML property takes a string that specifies a valid combination of text and elements. When the innerHTML property is set, the given string completely replaces the existing content of the object. If the string contains HTML tags, the string is parsed and formatted as it is placed into the document.

innerHTML属性采用一个字符串,指定文本和元素的有效组合。设置innerHTML属性后,给定的字符串将完全替换对象的现有内容。如果字符串包含HTML标记,则在将字符串放入文档时对其进行解析和格式化。

#4


1  

So you want to download a webpage as an XML object using javascript, but you don't want to use a webpage? Since you have no control over what the user will do (closing tabs or windows or whatnot) you would need to do this in like a OSX Dashboard widget or some separate application. A Firefox extension would also work, unless you have to worry about the user closing the browser.

所以你想使用javascript将网页下载为XML对象,但你不想使用网页?由于您无法控制用户将执行的操作(关闭选项卡或窗口或诸如此类),因此您需要在OSX Dashboard小部件或某个单独的应用程序中执行此操作。除非您担心用户关闭浏览器,否则Firefox扩展也可以使用。

#5


1  

Is there any option besides using the hidden iframe trick?

除了使用隐藏的iframe技巧之外还有其他选择吗?

Unfortunately, no, not now. Otherwise the microsummary code you point to would use it instead.

不幸的是,不,不是现在。否则,您指向的微观代码将使用它代替。

And if not, what is the best way to do the iframe trick so that your code works outside the context of any currently open tabs (so that closing tabs won't screw up code, etc)?

如果没有,那么做iframe技巧的最佳方法是什么,以便你的代码在任何当前打开的选项卡的上下文之外工作(这样关闭选项卡不会搞砸代码等)?

The code you quoted uses the recent browser window, so closing tabs won't affect parsing. Closing that browser window will abort your load, but you can deal with it (detect that the load is aborted and restart it in another window for example) and it doesn't happen very often.

您引用的代码使用最近的浏览器窗口,因此关闭选项卡不会影响解析。关闭该浏览器窗口将中止您的负载,但您可以处理它(检测到负载被中止并在另一个窗口中重新启动它)并且它不会经常发生。

You need a DOM window for the iframe to work properly, so there's no clean solution at the moment (if you're keen on using the mozilla parser).

你需要一个DOM窗口让iframe正常工作,所以目前还没有干净的解决方案(如果你热衷于使用mozilla解析器)。

#1


7  

Ajaxian actually had a post on inserting / retrieving html from an iframe today. You can probably use the js snippet they have posted there.

Ajaxian实际上有一篇关于今天从iframe插入/检索html的帖子。您可以使用他们在那里发布的js片段。

As for handling closing of a browser / tab, you can attach to the onbeforeunload (http://msdn.microsoft.com/en-us/library/ms536907(VS.85).aspx) event and do whatever you need to do.

至于处理浏览器/选项卡的关闭,您可以附加到onbeforeunload(http://msdn.microsoft.com/en-us/library/ms536907(VS.85).aspx)事件并做任何你需要做的事情。

#2


4  

Try this:

尝试这个:

var request = new XMLHttpRequest();

request.overrideMimeType( 'text/xml' );
request.onreadystatechange = process;
request.open ( 'GET', url );
request.send( null );

function process() { 
    if ( request.readyState == 4 && request.status == 200 ) {
        var xml = request.responseXML;
    }
}

Notice the overrideMimeType and responseXML.
The readyState == 4 is 'completed'.

注意overrideMimeType和responseXML。 readyState == 4是'已完成'。

#3


1  

Try creating a div

尝试创建一个div

document.createElement( 'div' );

And then set the tag soup HTML to the innerHTML of the div. The browser should process that into XML, which then you can parse.

然后将标签汤HTML设置为div的innerHTML。浏览器应该将其处理为XML,然后您可以解析。

The innerHTML property takes a string that specifies a valid combination of text and elements. When the innerHTML property is set, the given string completely replaces the existing content of the object. If the string contains HTML tags, the string is parsed and formatted as it is placed into the document.

innerHTML属性采用一个字符串,指定文本和元素的有效组合。设置innerHTML属性后,给定的字符串将完全替换对象的现有内容。如果字符串包含HTML标记,则在将字符串放入文档时对其进行解析和格式化。

#4


1  

So you want to download a webpage as an XML object using javascript, but you don't want to use a webpage? Since you have no control over what the user will do (closing tabs or windows or whatnot) you would need to do this in like a OSX Dashboard widget or some separate application. A Firefox extension would also work, unless you have to worry about the user closing the browser.

所以你想使用javascript将网页下载为XML对象,但你不想使用网页?由于您无法控制用户将执行的操作(关闭选项卡或窗口或诸如此类),因此您需要在OSX Dashboard小部件或某个单独的应用程序中执行此操作。除非您担心用户关闭浏览器,否则Firefox扩展也可以使用。

#5


1  

Is there any option besides using the hidden iframe trick?

除了使用隐藏的iframe技巧之外还有其他选择吗?

Unfortunately, no, not now. Otherwise the microsummary code you point to would use it instead.

不幸的是,不,不是现在。否则,您指向的微观代码将使用它代替。

And if not, what is the best way to do the iframe trick so that your code works outside the context of any currently open tabs (so that closing tabs won't screw up code, etc)?

如果没有,那么做iframe技巧的最佳方法是什么,以便你的代码在任何当前打开的选项卡的上下文之外工作(这样关闭选项卡不会搞砸代码等)?

The code you quoted uses the recent browser window, so closing tabs won't affect parsing. Closing that browser window will abort your load, but you can deal with it (detect that the load is aborted and restart it in another window for example) and it doesn't happen very often.

您引用的代码使用最近的浏览器窗口,因此关闭选项卡不会影响解析。关闭该浏览器窗口将中止您的负载,但您可以处理它(检测到负载被中止并在另一个窗口中重新启动它)并且它不会经常发生。

You need a DOM window for the iframe to work properly, so there's no clean solution at the moment (if you're keen on using the mozilla parser).

你需要一个DOM窗口让iframe正常工作,所以目前还没有干净的解决方案(如果你热衷于使用mozilla解析器)。