Sphinx与MySql -搜索好友列表(效率/速度)

I'm porting my application searches over to Sphinx from MySQL and am having a hard time figuring this one out, or if it even needs to be ported at all (I really want to know if it's worth using sphinx for this specific case for efficiency/speed):

我将我的应用程序搜索从MySQL移植到Sphinx上，并且很难弄清楚这个应用程序，或者它是否需要进行移植(我真的想知道在这个特定的情况下使用Sphinx是否值得提高效率/速度):

users
uid uname
  1    alex
  2    barry
  3    david

friends
uid | fid
  1     2
  2     1
  1     3
  3     1

Details are:
- InnoDB
- users: index on uid, index on uname
- friends: combined index on uid,fid

用户:uid上的索引，uname上的索引——朋友:uid上的联合索引，fid

Normally, to search all of alex's friends with mysql:

通常，用mysql搜索所有alex的朋友:

$uid = 1
$searchstr = "%$friendSearch%";
$query = "SELECT f.fid, u.uname FROM friends f 
          JOIN users u ON f.fid=u.uid
          WHERE f.uid=:uid AND u.uname LIKE :friendSearch";
$friends = $dbh->prepare($query);
$friends->bindParam(':uid', $uid, PDO::PARAM_INT);
$friends->bindParam(':friendSearch', $searchstr, PDO::PARAM_STR);
$friends->execute();

Is it any more efficient to find alex's friends with sphinx vs mysql or would that be an overkill?
If sphinx would be faster for this as the list hits thousands of people, what would the indexing query look like? How would I delete a friendship that no longer exists with sphinx as well, can I have a detailed example in this case? Should I change this query to use Sphinx?

用狮身人面像和mysql找到alex的朋友会更有效率吗?如果狮身人面像的速度会更快，因为列表会访问数千人，那么索引查询会是什么样子?我该如何删除与狮身人面像已经不存在的友谊，在这种情况下我能有一个详细的例子吗?我应该将该查询更改为使用Sphinx吗?

4 个解决方案

#1

Ok this is how I see this working.

这就是我看到的效果。

I have the exact same problem with MongoDB. MongoDB "offers" searching capabilities but just like MySQL you should never use them unless you wanna be choked with IO, CPU and memory problems and be forced to use a lot more servers to cope with your index than you normally would.

我对MongoDB有同样的问题。MongoDB提供了“搜索”功能，但就像MySQL一样，你不应该使用它们，除非你想让IO、CPU和内存出现问题，并*使用比平常多得多的服务器来处理索引。

The whole idea if using Sphinx (or another search tech) is to lower cost per server by having a performant index searcher.

如果使用Sphinx(或其他搜索技术)，整个想法是通过性能索引搜索器降低每个服务器的成本。

Sphinx however is not a storage engine. It is not as simple to query exact relationships across tables, they have remmedied this a little with SphinxQL but due to the nature of the full text index it still doesn't do an integral join like you would get in MySQL.

然而，Sphinx并不是存储引擎。查询表之间的确切关系并不简单，他们已经用SphinxQL稍微修改了一下，但由于全文索引的性质，它仍然不像在MySQL中那样执行整数连接。

Instead I would store the relationships within MySQL but have an index of "users" within Sphinx.

相反，我将在MySQL中存储关系，但是在Sphinx中有一个“用户”索引。

In my website I personally have 2 indexes:

在我的网站上，我个人有两个索引:

main (houses users,videos,channels and playlists)
主(包含用户、视频、频道和播放列表)
help (help system search)
帮助(帮助系统搜索)

These are delta updated once every minute. Since realtime indexes are still bit experimental at times and I personally have seen problems with high insertion/deletion rates I keep to delta updates. So I would use a delta index to update the main searchable objects of my site since this is less resource intensive and more performant than realtime indexes (from my own tests).

这些是每分钟更新一次的delta。由于实时索引有时仍处于试验阶段，我个人也遇到过插入/删除率高的问题，所以我一直保持增量更新。因此，我将使用一个增量索引来更新我的站点的主要搜索对象，因为这比实时索引(从我自己的测试中)更少的资源密集性和更强的性能。

Do note inorder to process deletions and what not your Sphinx collection through delta you will need a killlist and certain filters for your delta index. Here is an example from my index:

为了处理删除和你的Sphinx收集，你将需要一个killlist和特定的过滤器为你的delta索引。下面是我的索引中的一个例子:

source main_delta : main
{
    sql_query_pre = SET NAMES utf8
    sql_query_pre =
    sql_query = \
        SELECT id, deleted,  _id, uid, listing, title, description, category, tags, author_name, duration, rating, views, type, adult, videos, UNIX_TIMESTAMP(date_uploaded) AS date_uploaded \
        FROM documents \
        WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 ) OR update_time >( SELECT last_index_time FROM sph_counter WHERE counter_id=1 )

    sql_query_killlist = SELECT id FROM documents WHERE update_time>=( SELECT last_index_time FROM sph_counter WHERE counter_id=1 ) OR deleted = 1
}

This processes deletions and additions once every minute which is pretty much realtime for a real web app.

它每分钟处理一次删除和添加，这对于一个真正的web应用程序来说几乎是实时的。

So now we know how to store our indexes. I need to talk about the relationships. Sphinx (even though it has SphinxQL) won't do integral joins across data so I would personally recommend doing the relationship outside of Sphinx, not only that but as I said this relationship table will get high load so this is something that could impact the Sphinx index.

现在我们知道如何存储索引了。我需要谈一谈关系。Sphinx(即使它有SphinxQL)不会跨数据进行积分连接，因此我个人建议在Sphinx之外进行这种关系，不仅如此，而且正如我所说的，这个关系表将获得高负载，因此这可能会影响Sphinx索引。

I would do a query to pick out all ids and using that set of ids use the "filter" method on the sphinx API to filter the main index down to specific document ids. Once this is done you can search in Sphinx as normal. This is the most performant method I have found to date of dealing with this.

我将对所有id进行查询，并使用sphinx API上的“filter”方法将主索引过滤为特定的文档id。完成后，您可以像往常一样在Sphinx中搜索。这是我迄今为止所发现的最有效的方法。

The key thing to remember at all times is that Sphinx is a search tech while MySQL is a storage tech. Keep that in mind and you should be ok.

要记住的关键是，狮身人面像是一种搜索技术，而MySQL是一种存储技术。

Edit

As @N.B said (which I overlooked in my answer) Sphinx does have SphinxSE. Although primative and still in sort of testing stage of its development (same as realtime indexes) it does provide an actual MyISAM/InnoDB type storage to Sphinx. This is awesome. However there are caveats (as with anything):

@N。B说(我在回答中忽略了)狮身人面像确实有狮身人面像。尽管primative并且仍然处于开发的测试阶段(与实时索引相同)，但它确实为Sphinx提供了一个真正的MyISAM/InnoDB类型的存储。这是可怕的。然而，有一些注意事项(与任何事情一样):

The language is primative
语言是主要的
The joins are primative
连接是主要

However it can/could do the job your looking for so be sure to look into it.

不管怎样，它都能做你想做的工作，所以一定要仔细看看。

#2

so I'm going to go ahead and kinda outline what -I- feel the best use cases for sphinx are and you can kinda decide if it's more or less in line for what you're looking to do.

所以我要继续讲一下，我觉得sphinx的最佳用例是什么你可以决定它是否和你想做的差不多。

If all you're looking to do is a string search one one field; then with MySQL you can do wild card searches without much trouble and honstly with an index on it unless you're expecting millions of rows you are going to be fine.

如果你要做的是字符串搜索一个字段;然后使用MySQL，你就可以轻松地进行通配符搜索了，而且还可以在上面加上索引，除非你预期会有数百万行，否则你会没事的。

Now take facebook, that is not only indexing names, but pages ect or even any advanced search fields. Sphinx can take in x columns from MySQL, PostGRES, MongoDB, (insert your db you want here) and create a searchable full-text index across all of those.

现在，以facebook为例，这不仅是索引名称，而且是页面，甚至是任何高级搜索字段。Sphinx可以从MySQL、PostGRES、MongoDB中获取x列(在这里插入您想要的db)，并创建一个可搜索的全文索引。

Example:

例子:

You have 5 fields (house number, street, city, state, zipcode) and you want to do a full text search across all of those. Now with MySQL you could do searches on every single one, however with sphinx you can glob them all together then sphinx does some awesome statistical findings based on the string you've passed in and the matches which are resulting from it.

您有5个字段(住宅编号、街道、城市、州、zipcode)，您希望在所有这些字段中进行全文搜索。在MySQL中，你可以搜索每一个，但是使用sphinx，你可以将它们结合在一起然后sphinx根据你传入的字符串和它所产生的匹配，做了一些很棒的统计结果。

This Link: PHP Sphinx Searching does a great job at walking you through what it would look like and how things work together.

这个链接:PHP Sphinx搜索可以帮助您了解它的外观，以及它们是如何协同工作的。

So you aren't really replacing a database; you're just adding a special daemon to it (sphinx) which allows you to create specialized indexes and run your full text searches against it.

所以你并没有替换数据库;您只是添加了一个特殊的守护进程(sphinx)，它允许您创建专门的索引并运行您的全文搜索。

#3

No index can help you with this query, since you're looking for the string as an infix, not a prefix (you're looking for '%friendname%', not 'friendname%'.

没有索引可以帮助您处理这个查询，因为您正在查找字符串作为中缀，而不是前缀(您正在查找'%friendname%'，而不是'friendname%'。

Moreover, the LIKE solution will get you into corners: suppose you were looking for a friend called Ann. The LIKE expression will also match Marianne, Danny etc. There's no "complete word" notion in a LIKE expression.

此外，类似的解决方案会让你陷入困境:假设你正在寻找一个叫安的朋友。类似的表达也会与玛丽安、丹尼等匹配。类似的表达中没有“完整的词”概念。

A real solution is to use a text index. A FULLTEXT index is only available on MyISAM, and MySQL 5.6 (not GA at this time) will introduce FULLTEXT on InnoDB.

真正的解决方案是使用文本索引。全文本索引只在MyISAM上可用，而MySQL 5.6(此时不是GA)将在InnoDB上引入全文本。

Otherwise you can indeed use Sphinx to search the text.

否则，您确实可以使用Sphinx搜索文本。

With just hundreds or thousands, you will probably not see a big difference, unless you're really going to do many searches per second. With larger numbers, you will eventually realize that a full table scan is inferior to Sphinx search.

只有成百上千，你可能不会看到很大的不同，除非你真的要每秒钟做很多搜索。有了更大的数字，您最终会发现完整的表扫描比Sphinx搜索要差。

I'm using Sphinx a lot, on dozens and sometimes hundreds of millions large texts, and can testify it works like a charm.

我经常用狮身人面像，在很多甚至数亿的大型文本中使用，并且可以证明它像一个魅力。

The problem with Sphinx is, of course, that it's an external tool. With Sphinx you have to tell it to read data from your database. You can do so (using crontab for example) every 5 minutes, every hour, etc. So if rows are DELETEd, they will only be removed from sphinx the next time it reads the data from table. If you can live with that - that's the simplest solution.

当然，狮身人面像的问题在于它是一个外部工具。使用Sphinx，您必须让它从数据库中读取数据。您可以每5分钟、每小时等(使用crontab)，因此如果删除了行，它们将在下一次从表中读取数据时从sphinx中删除。如果你能接受这一点——这是最简单的解决方案。

If you can't, there are real time indexes in sphinx, so you may directly instruct it to remove certain rows. I am unable to explain everything in this port, so here are a couple links for you:

如果不能，sphinx中有实时索引，因此可以直接指示它删除某些行。我无法解释这个港口的一切，所以这里有几个链接给你:

Index updates

索引更新

Real time indexes

实时索引

As final conclusion, you have three options:

最后，你有三个选择:

Risk it and use a full table scan, assuming you won't have high load.
如果你没有高负载，就冒险使用全表扫描。
Wait for MySQL 5.6 and use FULLTEXT with InnoDB.
等待MySQL 5.6，使用InnoDB的全文。
Use sphinx
使用斯芬克斯

At this point in time, I would certainly use option #3: use sphinx.

在这一点上，我肯定会使用选项#3:使用sphinx。

#4

Take a look at the solution I propose here: https://*.com/a/22531268/543814

看看我在这里提出的解决方案:https://*.com/a/22531268/543814

Your friend names are probably short, and your query looks simple enough. You can probably afford to store all suffixes, perhaps in a separate table, pointing back to the original table to get the full name.

你的朋友的名字可能很短，你的查询看起来很简单。您可能可以存储所有的后缀，可能在一个单独的表中，指向原始表以获得全名。

This would give you fast infix search at the cost of a little bit more storage space.

这会给你快速的插入搜索以节省更多的存储空间。

Furthermore, to avoid finding 'Marianne' when searching for 'Ann', consider:

此外，为了避免在搜索“Ann”时发现“Marianne”，请考虑:

Using case-sensitive search. (Fragile; may break if your users enter their names or their search queries with incorrect capitalization.)
使用区分大小写的搜索。(脆弱;如果你的用户输入他们的名字或他们的搜索查询不正确的大小写，可能会中断)。
After the query, filtering your search results further, requiring word boundaries around the search term (e.g. regex \bAnn\b).
查询之后，进一步过滤搜索结果，需要在搜索词周围设置单词边界(例如regex \bAnn\b)。

#1

Ok this is how I see this working.

这就是我看到的效果。

The whole idea if using Sphinx (or another search tech) is to lower cost per server by having a performant index searcher.

如果使用Sphinx(或其他搜索技术)，整个想法是通过性能索引搜索器降低每个服务器的成本。

Instead I would store the relationships within MySQL but have an index of "users" within Sphinx.

相反，我将在MySQL中存储关系，但是在Sphinx中有一个“用户”索引。

In my website I personally have 2 indexes:

在我的网站上，我个人有两个索引:

main (houses users,videos,channels and playlists)
主(包含用户、视频、频道和播放列表)
help (help system search)
帮助(帮助系统搜索)

Do note inorder to process deletions and what not your Sphinx collection through delta you will need a killlist and certain filters for your delta index. Here is an example from my index:

为了处理删除和你的Sphinx收集，你将需要一个killlist和特定的过滤器为你的delta索引。下面是我的索引中的一个例子:

source main_delta : main
{
    sql_query_pre = SET NAMES utf8
    sql_query_pre =
    sql_query = \
        SELECT id, deleted,  _id, uid, listing, title, description, category, tags, author_name, duration, rating, views, type, adult, videos, UNIX_TIMESTAMP(date_uploaded) AS date_uploaded \
        FROM documents \
        WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 ) OR update_time >( SELECT last_index_time FROM sph_counter WHERE counter_id=1 )

    sql_query_killlist = SELECT id FROM documents WHERE update_time>=( SELECT last_index_time FROM sph_counter WHERE counter_id=1 ) OR deleted = 1
}

This processes deletions and additions once every minute which is pretty much realtime for a real web app.

它每分钟处理一次删除和添加，这对于一个真正的web应用程序来说几乎是实时的。

The key thing to remember at all times is that Sphinx is a search tech while MySQL is a storage tech. Keep that in mind and you should be ok.

要记住的关键是，狮身人面像是一种搜索技术，而MySQL是一种存储技术。

Edit

The language is primative
语言是主要的
The joins are primative
连接是主要

However it can/could do the job your looking for so be sure to look into it.

不管怎样，它都能做你想做的工作，所以一定要仔细看看。

#2

so I'm going to go ahead and kinda outline what -I- feel the best use cases for sphinx are and you can kinda decide if it's more or less in line for what you're looking to do.

所以我要继续讲一下，我觉得sphinx的最佳用例是什么你可以决定它是否和你想做的差不多。

Example:

例子:

This Link: PHP Sphinx Searching does a great job at walking you through what it would look like and how things work together.

这个链接:PHP Sphinx搜索可以帮助您了解它的外观，以及它们是如何协同工作的。

So you aren't really replacing a database; you're just adding a special daemon to it (sphinx) which allows you to create specialized indexes and run your full text searches against it.

所以你并没有替换数据库;您只是添加了一个特殊的守护进程(sphinx)，它允许您创建专门的索引并运行您的全文搜索。

#3

No index can help you with this query, since you're looking for the string as an infix, not a prefix (you're looking for '%friendname%', not 'friendname%'.

没有索引可以帮助您处理这个查询，因为您正在查找字符串作为中缀，而不是前缀(您正在查找'%friendname%'，而不是'friendname%'。

A real solution is to use a text index. A FULLTEXT index is only available on MyISAM, and MySQL 5.6 (not GA at this time) will introduce FULLTEXT on InnoDB.

真正的解决方案是使用文本索引。全文本索引只在MyISAM上可用，而MySQL 5.6(此时不是GA)将在InnoDB上引入全文本。

Otherwise you can indeed use Sphinx to search the text.

否则，您确实可以使用Sphinx搜索文本。

只有成百上千，你可能不会看到很大的不同，除非你真的要每秒钟做很多搜索。有了更大的数字，您最终会发现完整的表扫描比Sphinx搜索要差。

I'm using Sphinx a lot, on dozens and sometimes hundreds of millions large texts, and can testify it works like a charm.

我经常用狮身人面像，在很多甚至数亿的大型文本中使用，并且可以证明它像一个魅力。

If you can't, there are real time indexes in sphinx, so you may directly instruct it to remove certain rows. I am unable to explain everything in this port, so here are a couple links for you:

如果不能，sphinx中有实时索引，因此可以直接指示它删除某些行。我无法解释这个港口的一切，所以这里有几个链接给你:

Index updates

索引更新

Real time indexes

实时索引

As final conclusion, you have three options:

最后，你有三个选择:

Risk it and use a full table scan, assuming you won't have high load.
如果你没有高负载，就冒险使用全表扫描。
Wait for MySQL 5.6 and use FULLTEXT with InnoDB.
等待MySQL 5.6，使用InnoDB的全文。
Use sphinx
使用斯芬克斯

At this point in time, I would certainly use option #3: use sphinx.

在这一点上，我肯定会使用选项#3:使用sphinx。

#4

Take a look at the solution I propose here: https://*.com/a/22531268/543814

看看我在这里提出的解决方案:https://*.com/a/22531268/543814

你的朋友的名字可能很短，你的查询看起来很简单。您可能可以存储所有的后缀，可能在一个单独的表中，指向原始表以获得全名。

This would give you fast infix search at the cost of a little bit more storage space.

这会给你快速的插入搜索以节省更多的存储空间。

Furthermore, to avoid finding 'Marianne' when searching for 'Ann', consider:

此外，为了避免在搜索“Ann”时发现“Marianne”，请考虑:

Using case-sensitive search. (Fragile; may break if your users enter their names or their search queries with incorrect capitalization.)
使用区分大小写的搜索。(脆弱;如果你的用户输入他们的名字或他们的搜索查询不正确的大小写，可能会中断)。
After the query, filtering your search results further, requiring word boundaries around the search term (e.g. regex \bAnn\b).
查询之后，进一步过滤搜索结果，需要在搜索词周围设置单词边界(例如regex \bAnn\b)。

秒客网

Sphinx与MySql -搜索好友列表(效率/速度)

4 个解决方案

#1

Edit

#2

#3

#4

#1

Edit

#2

#3

#4

相关文章