我该如何处理维基页面内容中的自动链接?

时间:2022-10-19 22:32:50

What I mean by autolinking is the process by which wiki links inlined in page content are generated into either a hyperlink to the page (if it does exist) or a create link (if the page doesn't exist).

我所说的自动链接是将页面内容中内联的wiki链接生成到页面的超链接(如果存在)或创建链接(如果页面不存在)的过程。

With the parser I am using, this is a two step process - first, the page content is parsed and all of the links to wiki pages from the source markup are extracted. Then, I feed an array of the existing pages back to the parser, before the final HTML markup is generated.

使用我正在使用的解析器,这是一个两步过程 - 首先,解析页面内容,并提取源标记的所有wiki页面链接。然后,在生成最终HTML标记之前,我将现有页面的数组反馈给解析器。

What is the best way to handle this process? It seems as if I need to keep a cached list of every single page on the site, rather than having to extract the index of page titles each time. Or is it better to check each link separately to see if it exists? This might result in a lot of database lookups if the list wasn't cached. Would this still be viable for a larger wiki site with thousands of pages?

处理此过程的最佳方法是什么?似乎我需要保留网站上每个页面的缓存列表,而不是每次都必须提取页面标题的索引。或者,最好分别检查每个链接以查看它是否存在?如果列表未缓存,这可能会导致大量数据库查找。对于拥有数千页的大型wiki网站,这仍然可行吗?

6 个解决方案

#1


1  

In my own wiki I check all the links (without caching), but my wiki is only used by a few people internally. You should benchmark stuff like this.

在我自己的wiki中,我检查所有链接(没有缓存),但我的wiki仅在内部由少数人使用。你应该像这样的东西。

#2


1  

In my own wiki system my caching system is pretty simple - when the page is updated it checks links to make sure they are valid and applies the correct formatting/location for those that aren't. The cached page is saved as a HTML page in my cache root.

在我自己的wiki系统中,我的缓存系统非常简单 - 当页面更新时,它会检查链接以确保它们是有效的,并为那些没有的链接应用正确的格式/位置。缓存页面将保存为缓存根目录中的HTML页面。

Pages that are marked as 'not created' during the page update are inserted into the a table of the database that holds the page and then a csv of pages that link to it.

在页面更新期间标记为“未创建”的页面将插入到保存页面的数据库的表中,然后插入链接到该页面的csv页面。

When someone creates that page it initiates a scan to look through each linking page and re-caches the linking page with the correct link and formatting.

当有人创建该页面时,它会启动扫描以查看每个链接页面,并使用正确的链接和格式重新缓存链接页面。

If you weren't interested in highlighting non-created pages however you could just have a checker to see if the page is created when you attempt to access it - and if not redirect to the creation page. Then just link to pages as normal in other articles.

如果您对突出显示未创建的页面不感兴趣,那么您可以使用检查器来查看是否在您尝试访问该页面时创建该页面 - 如果没有重定向到创建页面。然后在其他文章中正常链接到页面。

#3


1  

I tried to do this once and it was a nightmare! My solution was a nasty loop in a SQL procedure, and I don't recommend it.

我试过这样做一次,这是一场噩梦!我的解决方案在SQL过程中是一个讨厌的循环,我不推荐它。

One thing that gave me trouble was deciding what link to use on a multi-word phrase. Say you had some text saying "I am using Stack Overflow" and your wiki had 3 pages called "stack", "overflow" and "stack overflow"....which part of your phrase gets linked to where? It will happen!

让我烦恼的一件事是决定在多词短语上使用什么链接。假设你有一些文字说“我正在使用Stack Overflow”并且你的wiki有3个页面叫做“堆栈”,“溢出”和“堆栈溢出”....你的短语的哪一部分被链接到哪里?它会发生!

#4


0  

My idea would be to query the titles like SELECT title FROM articles and simply check if each wikilink is in that array of strings. If it is you link to the page, if not, you link to the create page.

我的想法是查询SELECT title FROM文章等标题,然后检查每个wikilink是否在该字符串数组中。如果是链接到页面,如果没有,则链接到创建页面。

#5


0  

In a personal project I made with Sinatra (link text) after I run the content through Markdown, I do a gsub to replace wiki words and other things (like [[Here is my link]] and whatnot) with proper links, on each checking if the page exists and linking to create or view depending.

在我通过Markdown运行内容后,我用Sinatra制作的个人项目(链接文本),我做了一个gsub来替换维基词和其他东西(比如[[这里是我的链接]]和诸如此类的东西)检查页面是否存在并链接到创建或查看依赖。

It's not the best, but I didn't build this app with caching/speed in mind. It's a low resource simple wiki.

这不是最好的,但我没有考虑到缓存/速度来构建这个应用程序。这是一个资源匮乏的简单维基。

If speed was more important, you could wrap the app in something to cache it. For example, sinatra can be wrapped with the Rack caching.

如果速度更重要,您可以将应用程序包装起来以缓存它。例如,sinatra可以使用Rack缓存进行包装。

#6


0  

Based on my experience developing Juli, which is an offline personal wiki with autolink, generating static HTML approach may fix your issue.

根据我开发Juli的经验,这是一个带有自动链接的离线个人wiki,生成静态HTML方法可能会解决您的问题。

As you think, it takes long time to generate autolinked Wiki page. However, in generating static HTML situation, regenerating autolinked Wiki page happens only when a wikipage is newly added or deleted (in other words, it doesn't happen when updating wikipage) and the 'regenerating' can be done in background so that usually I don't matter how it take long time. User will see only the generated static HTML.

如您所愿,生成自动链接的Wiki页面需要很长时间。但是,在生成静态HTML情况时,重新生成自动链接的Wiki页面仅在新添加或删除wikipage时发生(换句话说,更新wikipage时不会发生)并且“再生”可以在后台完成,因此通常我不管怎么花很长时间。用户只能看到生成的静态HTML。

#1


1  

In my own wiki I check all the links (without caching), but my wiki is only used by a few people internally. You should benchmark stuff like this.

在我自己的wiki中,我检查所有链接(没有缓存),但我的wiki仅在内部由少数人使用。你应该像这样的东西。

#2


1  

In my own wiki system my caching system is pretty simple - when the page is updated it checks links to make sure they are valid and applies the correct formatting/location for those that aren't. The cached page is saved as a HTML page in my cache root.

在我自己的wiki系统中,我的缓存系统非常简单 - 当页面更新时,它会检查链接以确保它们是有效的,并为那些没有的链接应用正确的格式/位置。缓存页面将保存为缓存根目录中的HTML页面。

Pages that are marked as 'not created' during the page update are inserted into the a table of the database that holds the page and then a csv of pages that link to it.

在页面更新期间标记为“未创建”的页面将插入到保存页面的数据库的表中,然后插入链接到该页面的csv页面。

When someone creates that page it initiates a scan to look through each linking page and re-caches the linking page with the correct link and formatting.

当有人创建该页面时,它会启动扫描以查看每个链接页面,并使用正确的链接和格式重新缓存链接页面。

If you weren't interested in highlighting non-created pages however you could just have a checker to see if the page is created when you attempt to access it - and if not redirect to the creation page. Then just link to pages as normal in other articles.

如果您对突出显示未创建的页面不感兴趣,那么您可以使用检查器来查看是否在您尝试访问该页面时创建该页面 - 如果没有重定向到创建页面。然后在其他文章中正常链接到页面。

#3


1  

I tried to do this once and it was a nightmare! My solution was a nasty loop in a SQL procedure, and I don't recommend it.

我试过这样做一次,这是一场噩梦!我的解决方案在SQL过程中是一个讨厌的循环,我不推荐它。

One thing that gave me trouble was deciding what link to use on a multi-word phrase. Say you had some text saying "I am using Stack Overflow" and your wiki had 3 pages called "stack", "overflow" and "stack overflow"....which part of your phrase gets linked to where? It will happen!

让我烦恼的一件事是决定在多词短语上使用什么链接。假设你有一些文字说“我正在使用Stack Overflow”并且你的wiki有3个页面叫做“堆栈”,“溢出”和“堆栈溢出”....你的短语的哪一部分被链接到哪里?它会发生!

#4


0  

My idea would be to query the titles like SELECT title FROM articles and simply check if each wikilink is in that array of strings. If it is you link to the page, if not, you link to the create page.

我的想法是查询SELECT title FROM文章等标题,然后检查每个wikilink是否在该字符串数组中。如果是链接到页面,如果没有,则链接到创建页面。

#5


0  

In a personal project I made with Sinatra (link text) after I run the content through Markdown, I do a gsub to replace wiki words and other things (like [[Here is my link]] and whatnot) with proper links, on each checking if the page exists and linking to create or view depending.

在我通过Markdown运行内容后,我用Sinatra制作的个人项目(链接文本),我做了一个gsub来替换维基词和其他东西(比如[[这里是我的链接]]和诸如此类的东西)检查页面是否存在并链接到创建或查看依赖。

It's not the best, but I didn't build this app with caching/speed in mind. It's a low resource simple wiki.

这不是最好的,但我没有考虑到缓存/速度来构建这个应用程序。这是一个资源匮乏的简单维基。

If speed was more important, you could wrap the app in something to cache it. For example, sinatra can be wrapped with the Rack caching.

如果速度更重要,您可以将应用程序包装起来以缓存它。例如,sinatra可以使用Rack缓存进行包装。

#6


0  

Based on my experience developing Juli, which is an offline personal wiki with autolink, generating static HTML approach may fix your issue.

根据我开发Juli的经验,这是一个带有自动链接的离线个人wiki,生成静态HTML方法可能会解决您的问题。

As you think, it takes long time to generate autolinked Wiki page. However, in generating static HTML situation, regenerating autolinked Wiki page happens only when a wikipage is newly added or deleted (in other words, it doesn't happen when updating wikipage) and the 'regenerating' can be done in background so that usually I don't matter how it take long time. User will see only the generated static HTML.

如您所愿,生成自动链接的Wiki页面需要很长时间。但是,在生成静态HTML情况时,重新生成自动链接的Wiki页面仅在新添加或删除wikipage时发生(换句话说,更新wikipage时不会发生)并且“再生”可以在后台完成,因此通常我不管怎么花很长时间。用户只能看到生成的静态HTML。