wget WIKI,不要获取差异页面(由正则表达式排除?)

时间:2022-12-20 15:21:45

I'm trying to download a static mirror of a wiki using wget. I only want the latest version of each article (not the full history or diffs between versions). It would be easy to just download the whole thing and delete unnecessary pages later, but doing so would take too much time and place an unnecessary strain on the server.


There are a number of pages I clearly don't need such as:



Is there a way to tell wget not to download and recurse on URLs that have 'action=diff' in them? Or otherwise exclude URLs that match some regex?

有没有办法告诉wget不要下载并递归其中包含'action = diff'的网址?或者以其他方式排除与某些正则表达式匹配的URL?

1 个解决方案


-R '*action=diff*,*action=edit*'


-R '*action=diff*,*action=edit*'