我如何抓取自己的网站？

I've inherited an old Classic ASP website to modify. Although not requested up-front, I'd like to delete a bunch of the old "orphaned" pages.

我继承了一个旧的经典ASP网站进行修改。虽然没有预先要求,但我想删除一堆旧的“孤儿”页面。

For some reason, The old developer decided to create muliple instances of the file instead of using source control (eg. index-t.asp, index-feb09.asp, index-menutest.asp).

出于某种原因,旧开发人员决定创建文件的多个实例而不是使用源代码控制(例如index-t.asp,index-feb09.asp,index-menutest.asp)。

I'm wondering if anyone knows of a program or website, that can crawl my own site for me? It probably needs to be able to crawl public site, since there are lots of include files. Also, some of the urls are relative and some are absolute.

我想知道是否有人知道某个程序或网站,可以为我抓取我自己的网站?它可能需要能够抓取公共站点,因为有很多包含文件。此外,一些网址是相对的,有些是绝对的。

4 个解决方案

#1

My favorite tool is Xenu.

我最喜欢的工具是Xenu。

#2

There's also the W3C link checker: http://validator.w3.org/checklink

还有W3C链接检查器:http://validator.w3.org/checklink

#3

You should never let a once-valid URL go stale. Bad web developer! No biscuit!!

您永远不应该让曾经有效的URL过时。糟糕的网页开发者没有饼干!!

#4

You should consider:

你应该考虑:

Putting the entire existing site into source control, then

然后,将整个现有站点置于源代码管理中

Delete the extra pages and see who complains

删除额外的页面,看看谁抱怨

#1