如何防止搜索引擎索引我的网站的单一页?

时间:2022-11-11 23:02:47

I don't want the search engines to index my imprint page. How could I do that?

我不希望搜索引擎索引我的压印页。我怎么做呢?

7 个解决方案

#1


27  

You need a simple robots.txt file. Basically, it's a text file that tells search engines not to index particular pages.
You don't need to include it in the header of your page; as long as it's in the root directory of your website it will be picked up by crawlers.
Create it in the root folder of your website and put the following text in:

你需要一个简单的机器人。txt文件。基本上,它是一个文本文件,告诉搜索引擎不要索引特定的页面。你不需要把它包含在页面的页眉中;只要它在你的网站的根目录中,它就会被爬虫抓取。在您的网站的根文件夹中创建它,并将以下文本放入:

User-Agent: *
Disallow: /imprint-page.htm

Note that you'd replace imprint-page.html in the example with the actual name of the page (or the directory) that you wish to keep from being indexed.

注意,您将替换imprint-page。示例中的html,包含希望避免被索引的页面(或目录)的实际名称。

That's it! If you want to get more advanced, you can check out here, here, or here for a lot more info. Also, you can find free tools online that will generate a robots.txt file for you (for example, here).

就是这样!如果你想变得更高级,你可以在这里,这里,或者这里查看更多的信息。此外,你还可以在网上找到可以产生机器人的免费工具。txt文件为您(例如,这里)。

#2


29  

Also you can add following meta tag in HEAD of that page

你也可以在页面的头部添加以下元标签

<meta name="robots" content="noindex,nofollow" />

#3


5  

You can setup a robots.txt file to try and tell search engines to ignore certain directories.

你可以设置一个机器人。txt文件试图告诉搜索引擎忽略某些目录。

See here for more info.

更多信息请参见这里。

Basically:

基本上:

User-agent: *
Disallow: /[directory or file here]

#4


3  

<meta name="robots" content="noindex, nofollow">

Just include this line in your <html> <head> tag. Why I'm telling you this because if you use robots.txt file to hide your URLs that might be login pages or other protected URLs that you won't show to someone else or search engines.

在 标记中包含这一行。为什么我告诉你这个是因为如果你用机器人。txt文件隐藏你的url,可能是登录页面或其他受保护的url,你不会显示给别人或搜索引擎。

What I can do is just accessing the robots.txt file directly from your website and can see which URLs you have are secret. Then what is the logic behind this robots.txt file?

我能做的就是接近机器人。txt文件直接从你的网站,可以看到你有哪些网址是秘密的。那么这些机器人背后的逻辑是什么呢?txt文件吗?

The good way is to include the meta tag from above and keep yourself safe from anyone.

最好的方法是包含来自上面的元标签,让自己远离任何人。

#5


3  

Nowadays, the best method is to use a robots meta tag and set it to noindex,follow:

现在,最好的方法是使用一个robots元标记并将其设置为noindex,如下:

<meta name="robots" content="noindex, follow">

#6


0  

Create a robots.txt file and set the controls there.

创建一个机器人。txt文件并设置控件。

Here are the docs for google: http://code.google.com/web/controlcrawlindex/docs/robots_txt.html

下面是谷歌的文档:http://code.google.com/web/controlcrawlindex/docs/robots_txt.html

#7


0  

A robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds: you can explicitly disallow :

一个机器人想要查看一个网站的URL,比如http://www.example.com/welcome.html。在此之前,它首先检查http://www.example.com/robots.txt,并发现:您可以明确禁止:

User-agent: *
Disallow: /~joe/junk.html

please visit below link for details robots.txt

请访问下面的链接以获取详细信息

#1


27  

You need a simple robots.txt file. Basically, it's a text file that tells search engines not to index particular pages.
You don't need to include it in the header of your page; as long as it's in the root directory of your website it will be picked up by crawlers.
Create it in the root folder of your website and put the following text in:

你需要一个简单的机器人。txt文件。基本上,它是一个文本文件,告诉搜索引擎不要索引特定的页面。你不需要把它包含在页面的页眉中;只要它在你的网站的根目录中,它就会被爬虫抓取。在您的网站的根文件夹中创建它,并将以下文本放入:

User-Agent: *
Disallow: /imprint-page.htm

Note that you'd replace imprint-page.html in the example with the actual name of the page (or the directory) that you wish to keep from being indexed.

注意,您将替换imprint-page。示例中的html,包含希望避免被索引的页面(或目录)的实际名称。

That's it! If you want to get more advanced, you can check out here, here, or here for a lot more info. Also, you can find free tools online that will generate a robots.txt file for you (for example, here).

就是这样!如果你想变得更高级,你可以在这里,这里,或者这里查看更多的信息。此外,你还可以在网上找到可以产生机器人的免费工具。txt文件为您(例如,这里)。

#2


29  

Also you can add following meta tag in HEAD of that page

你也可以在页面的头部添加以下元标签

<meta name="robots" content="noindex,nofollow" />

#3


5  

You can setup a robots.txt file to try and tell search engines to ignore certain directories.

你可以设置一个机器人。txt文件试图告诉搜索引擎忽略某些目录。

See here for more info.

更多信息请参见这里。

Basically:

基本上:

User-agent: *
Disallow: /[directory or file here]

#4


3  

<meta name="robots" content="noindex, nofollow">

Just include this line in your <html> <head> tag. Why I'm telling you this because if you use robots.txt file to hide your URLs that might be login pages or other protected URLs that you won't show to someone else or search engines.

在 标记中包含这一行。为什么我告诉你这个是因为如果你用机器人。txt文件隐藏你的url,可能是登录页面或其他受保护的url,你不会显示给别人或搜索引擎。

What I can do is just accessing the robots.txt file directly from your website and can see which URLs you have are secret. Then what is the logic behind this robots.txt file?

我能做的就是接近机器人。txt文件直接从你的网站,可以看到你有哪些网址是秘密的。那么这些机器人背后的逻辑是什么呢?txt文件吗?

The good way is to include the meta tag from above and keep yourself safe from anyone.

最好的方法是包含来自上面的元标签,让自己远离任何人。

#5


3  

Nowadays, the best method is to use a robots meta tag and set it to noindex,follow:

现在,最好的方法是使用一个robots元标记并将其设置为noindex,如下:

<meta name="robots" content="noindex, follow">

#6


0  

Create a robots.txt file and set the controls there.

创建一个机器人。txt文件并设置控件。

Here are the docs for google: http://code.google.com/web/controlcrawlindex/docs/robots_txt.html

下面是谷歌的文档:http://code.google.com/web/controlcrawlindex/docs/robots_txt.html

#7


0  

A robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds: you can explicitly disallow :

一个机器人想要查看一个网站的URL,比如http://www.example.com/welcome.html。在此之前,它首先检查http://www.example.com/robots.txt,并发现:您可以明确禁止:

User-agent: *
Disallow: /~joe/junk.html

please visit below link for details robots.txt

请访问下面的链接以获取详细信息