搜索引擎优化-如何要求爬虫在数据加载前等待?

时间:2022-04-19 11:19:06

I'm using a mvvc framework (Angular) and having some trouble getting the site data indexed. All the static data is crawled fine but dynamic data from a cloud db is missed.

我使用的是mvvc框架(角度的),而且很难将站点数据编入索引。所有静态数据都被很好地抓取,但是来自云数据库的动态数据却被遗漏了。

Is there any way to politely ask a crawler to wait a few hundred ms before going at it?

有什么方法可以礼貌地让爬虫等几百毫秒后再行动呢?

2 个解决方案

#1


1  

There is no way to tell a spider to wait. This would be counter-productive since their job is to index data as quickly as possible and each wait would accumulate into days/weeks/months of delays. (Note that Google has explored some javascript rendering, but that won't help with XHR content).

没有办法让蜘蛛等待。这将会产生反作用,因为他们的工作是尽可能快地索引数据,而每次等待都会累积成几天/几周/几个月的延迟。(请注意,谷歌已经研究了一些javascript呈现,但这对XHR内容没有帮助)。

The correct answer is to explore Making AJAX Applications Crawlable. The gist of this approach is that you utilize a tool like prerender.io during your deploy process to pre-render dynamic content. Then you list that content in your sitemap and either utilize the _escaped_fragment_ rewrites at your server or the meta tag as explained here (from Getting Started):

正确的答案是探索使AJAX应用程序能够爬行。这种方法的要点是您可以使用prerender之类的工具。在您的部署过程中,io以预呈现动态内容。然后在站点地图中列出这些内容,或者使用服务器上的_escaped_fragment_重写,或者使用这里解释的meta标记(从开始):

In order to make pages without hash fragments crawlable, you include a special meta tag in the head of the HTML of your page. The meta tag takes the following form:

为了使页面不受哈希片段的影响,您需要在页面的HTML头部包含一个特殊的元标记。元标签采用以下形式:

<meta name="fragment" content="!">

In both cases, you still have to pre-render dynamic content to cached HTML pages and make those available to the search engine when it requests content from your server.

在这两种情况下,仍然需要将动态内容预先呈现到缓存的HTML页面中,并在搜索引擎从服务器请求内容时将这些内容提供给搜索引擎。

#2


1  

Best way to put noindex, nofollow that time.

最好的方法是不要写索引,不要跟着时间走。

after load data fully, you can remove that tags.

在完全加载数据之后,您可以删除该标记。

#1


1  

There is no way to tell a spider to wait. This would be counter-productive since their job is to index data as quickly as possible and each wait would accumulate into days/weeks/months of delays. (Note that Google has explored some javascript rendering, but that won't help with XHR content).

没有办法让蜘蛛等待。这将会产生反作用,因为他们的工作是尽可能快地索引数据,而每次等待都会累积成几天/几周/几个月的延迟。(请注意,谷歌已经研究了一些javascript呈现,但这对XHR内容没有帮助)。

The correct answer is to explore Making AJAX Applications Crawlable. The gist of this approach is that you utilize a tool like prerender.io during your deploy process to pre-render dynamic content. Then you list that content in your sitemap and either utilize the _escaped_fragment_ rewrites at your server or the meta tag as explained here (from Getting Started):

正确的答案是探索使AJAX应用程序能够爬行。这种方法的要点是您可以使用prerender之类的工具。在您的部署过程中,io以预呈现动态内容。然后在站点地图中列出这些内容,或者使用服务器上的_escaped_fragment_重写,或者使用这里解释的meta标记(从开始):

In order to make pages without hash fragments crawlable, you include a special meta tag in the head of the HTML of your page. The meta tag takes the following form:

为了使页面不受哈希片段的影响,您需要在页面的HTML头部包含一个特殊的元标记。元标签采用以下形式:

<meta name="fragment" content="!">

In both cases, you still have to pre-render dynamic content to cached HTML pages and make those available to the search engine when it requests content from your server.

在这两种情况下,仍然需要将动态内容预先呈现到缓存的HTML页面中,并在搜索引擎从服务器请求内容时将这些内容提供给搜索引擎。

#2


1  

Best way to put noindex, nofollow that time.

最好的方法是不要写索引,不要跟着时间走。

after load data fully, you can remove that tags.

在完全加载数据之后,您可以删除该标记。