如何构建像*这样的标签系统

时间:2022-05-29 17:08:49

I'm implementing a tag system similar to * tag system but I just wonder How-to get related tags and define the relationships weights between tags like the list of "Related Tags" in any tag page like this https://*.com/questions/tagged/php they define the relationship weight by the co-occurrence between 2 or more tags

我实现一个标签系统类似于*标签系统,但我只是想知道如何得到相关标记和定义标签之间的权重关系中的“相关标签”列表这样的任何标签页面https://*.com/questions/tagged/php定义重量之间的共生关系的2个或更多的标签

How I can do this in PHP/MySQl to define the most related tags for tag "X" and keep all weights up to date as users add more and more posts/questions ?

如何在PHP/MySQl中定义标记“X”最相关的标记,并在用户添加越来越多的帖子/问题时保持所有权重的最新?

3 个解决方案

#1


2  

You probably want to look into statistics for this:

你可能想看看统计数据:

  1. given a tag X
  2. 给定一个标记X
  3. check all other tags Y
  4. 检查所有其他标签Y
  5. count how often Y and X show up at the same time
  6. 计算Y和X同时出现的频率
  7. divide by how often Y shows up
  8. 除以Y出现的频率。
  9. ???
  10. ? ? ?
  11. Profit!!!
  12. 利润! ! !

As for more information on step 5: This information only changes very slowly, so you can really cache this stuff and only recreate it when you have time.

关于第5步的更多信息:这些信息的变化非常缓慢,因此您可以真正地缓存这些内容,并且只有在您有时间时才能重新创建它。

What you want in the end is a relation

你最终想要的是一种关系

conditional_probability(X, Y, P)

Which tells you how probable (P) tag Y is, given X. P was calculated in step 4.

它告诉你,给定x。P在步骤4中计算,标记Y的可能性有多大。

#2


1  

I used this blog entry for calculating relative tag size within a cloud. You can use this algorithm on the entire could or a particular found set.

我使用这个博客条目来计算云中的相对标记大小。你可以对整个can或者一个特定的找到集使用这个算法。

Instead of storing the denormalized weights for all tags in the database, I cache them in my (Ruby) process, and rebuild them when tags are added/removed or when the process restarts.

我没有为数据库中的所有标记存储非规范化的权重,而是在(Ruby)进程中缓存它们,并在添加/删除标记或进程重新启动时重新构建它们。

As for how to store them, you generally want:

至于如何储存,你一般希望:

  1. A tags table associating unique tag names with row IDs, and
  2. 一个标记表,将惟一的标记名与行id关联起来。
  3. A tags_items table providing you with your n-to-n mapping between tags and items.
  4. 一个tags_items表,为您提供标记和项之间的n- n映射。

Once you have that, and once you have a found set of items on a results page, it's a simple join and unique to find out the set of 'related' tags.

一旦你有了它,一旦你在结果页面上找到了一组条目,它就是一个简单的连接,并且是唯一找到“相关”标签的集合。

#3


0  

1 Each post id can be tagged with one or more tags (PHP + other tags)

每个post id都可以使用一个或多个标签(PHP +其他标签)进行标记

2 Going back the same way each tag has associated post id

返回的方式与每个标签有关联的post id相同

3 Foreach post id get all tags other than PHP

每个post id获取除PHP之外的所有标记

4 Show only those which has count more than a prticular Number (say 4000)

4 .只显示比实际数字多的数字(如4000)

Think about it this question has been tagged "Mysql" "Database-design" "Tags" and "Tagging" Do you see how you have related PHP with other tags.

想想这个问题被标记为“Mysql”“数据库设计”“标签”和“标签”你看到你是如何把PHP和其他标签联系起来的吗?

#1


2  

You probably want to look into statistics for this:

你可能想看看统计数据:

  1. given a tag X
  2. 给定一个标记X
  3. check all other tags Y
  4. 检查所有其他标签Y
  5. count how often Y and X show up at the same time
  6. 计算Y和X同时出现的频率
  7. divide by how often Y shows up
  8. 除以Y出现的频率。
  9. ???
  10. ? ? ?
  11. Profit!!!
  12. 利润! ! !

As for more information on step 5: This information only changes very slowly, so you can really cache this stuff and only recreate it when you have time.

关于第5步的更多信息:这些信息的变化非常缓慢,因此您可以真正地缓存这些内容,并且只有在您有时间时才能重新创建它。

What you want in the end is a relation

你最终想要的是一种关系

conditional_probability(X, Y, P)

Which tells you how probable (P) tag Y is, given X. P was calculated in step 4.

它告诉你,给定x。P在步骤4中计算,标记Y的可能性有多大。

#2


1  

I used this blog entry for calculating relative tag size within a cloud. You can use this algorithm on the entire could or a particular found set.

我使用这个博客条目来计算云中的相对标记大小。你可以对整个can或者一个特定的找到集使用这个算法。

Instead of storing the denormalized weights for all tags in the database, I cache them in my (Ruby) process, and rebuild them when tags are added/removed or when the process restarts.

我没有为数据库中的所有标记存储非规范化的权重,而是在(Ruby)进程中缓存它们,并在添加/删除标记或进程重新启动时重新构建它们。

As for how to store them, you generally want:

至于如何储存,你一般希望:

  1. A tags table associating unique tag names with row IDs, and
  2. 一个标记表,将惟一的标记名与行id关联起来。
  3. A tags_items table providing you with your n-to-n mapping between tags and items.
  4. 一个tags_items表,为您提供标记和项之间的n- n映射。

Once you have that, and once you have a found set of items on a results page, it's a simple join and unique to find out the set of 'related' tags.

一旦你有了它,一旦你在结果页面上找到了一组条目,它就是一个简单的连接,并且是唯一找到“相关”标签的集合。

#3


0  

1 Each post id can be tagged with one or more tags (PHP + other tags)

每个post id都可以使用一个或多个标签(PHP +其他标签)进行标记

2 Going back the same way each tag has associated post id

返回的方式与每个标签有关联的post id相同

3 Foreach post id get all tags other than PHP

每个post id获取除PHP之外的所有标记

4 Show only those which has count more than a prticular Number (say 4000)

4 .只显示比实际数字多的数字(如4000)

Think about it this question has been tagged "Mysql" "Database-design" "Tags" and "Tagging" Do you see how you have related PHP with other tags.

想想这个问题被标记为“Mysql”“数据库设计”“标签”和“标签”你看到你是如何把PHP和其他标签联系起来的吗?