在大型列表中查找字母位置

时间:2022-09-12 23:49:16

I have an as400 table containing roughly 1 million rows of full names / company names which I would like to convert to use another datastore while still matching the speed of the original.

我有一个as400表,其中包含大约100万行全名/公司名称,我想将其转换为使用另一个数据存储区,同时仍然匹配原始速度。

Currently, a user enters the search and almost instantaneously gets the alphabetical position of the search term in the table and and a page of matches. The user can then paginate either up or down through the records very quickly.

目前,用户输入搜索并几乎立即获得表格和匹配页面中搜索词的字母位置。然后,用户可以非常快速地在记录中向上或向下分页。

There is almost no updating of the data and approximately 50 inserts per week. I'm thinking that any database can maintain an alphabetical index of the names, but I'm unsure of how to quickly find the position of the search within the dataset. Any suggestions are greatly appreciated.

几乎没有数据更新,每周约50次插入。我认为任何数据库都可以维护名称的字母索引,但我不确定如何在数据集中快速找到搜索的位置。任何建议都非常感谢。

3 个解决方案

#1


This sounds just like a regular pagination of results, except that instead of going to a specific page based on a page number or offset being requested, it goes to a specific page based on where the user's search fits in the results alphabetically.

这听起来就像是对结果的常规分页,除了不是基于页面编号或要求的偏移去特定页面,而是根据用户的搜索按字母顺序在结果中的位置进入特定页面。

Let's say you want to fetch 10 rows after this position, and 10 rows before.

假设您要在此位置后获取10行,之前需要10行。

If the user searches for 'Smith', you could do two selects such that:

如果用户搜索“Smith”,您可以执行以下两项选择:

SELECT
  name
FROM
  companies
WHERE
  name < 'Smith'
ORDER BY
  name DESC
LIMIT 10

and then

SELECT
  name
FROM
  companies
WHERE
  name >= 'Smith'
ORDER BY
  name
LIMIT 10

You could do a UNION to fetch that in one query, the above is just simplified.

您可以在一个查询中执行UNION来获取它,上面只是简化了。

The term the user searched for would fit half way through these results. If there are any exact matches, then the first exact match will be positioned such that it is eleventh.

用户搜索的术语将适合这些结果的一半。如果有任何完全匹配,那么第一个完全匹配将被定位为第十一个。

Note that if the user searches for 'aaaaaaaa' then they'll probably just get the 10 first results with nothing before it, and for 'zzzzzzzz' they may get just the 10 last results.

请注意,如果用户搜索“aaaaaaaa”,那么他们可能只会获得前面没有任何内容的10个结果,而对于“zzzzzzzz”,他们可能只得到最后10个结果。

I'm assuming that the SQL engine in question allows >= and < comparisons between strings (and can optimise that in indexes), but I haven't tested this, maybe you can't do this. If, like MySQL, it supports internationalized collations then you could even have the ordering done correctly for non-ascii characters.

我假设有问题的SQL引擎允许> =和 <字符串之间的比较(并且可以在索引中进行优化),但我没有测试过这个,也许你不能这样做。如果像mysql一样,它支持国际化的排序规则,那么你甚至可以为非ascii字符正确地完成排序。< p>

#2


If by "the position of the search" you mean the number of the record if they were enumerated alphabetically, you may want to try something like:

如果“搜索的位置”是指按字母顺序枚举的记录编号,您可能需要尝试以下操作:

select count(*) from companies where name < 'Smith'

Most databases ought to optimize that reasonably well (but try it--theories you read on the web don't trump empirical data).

大多数数据库应该合理地优化(但尝试一下 - 你在网上阅读的理论并不能胜过经验数据)。

#3


Just to add to the ordering suggestions:

只是添加订购建议:

  • Add an index to the name if this is your standard means of data retrieval.
  • 如果这是您的标准数据检索方法,请为名称添加索引。

  • You can paginate efficiently by combining LIMIT and OFFSET.
  • 您可以通过组合LIMIT和OFFSET来有效地分页。

#1


This sounds just like a regular pagination of results, except that instead of going to a specific page based on a page number or offset being requested, it goes to a specific page based on where the user's search fits in the results alphabetically.

这听起来就像是对结果的常规分页,除了不是基于页面编号或要求的偏移去特定页面,而是根据用户的搜索按字母顺序在结果中的位置进入特定页面。

Let's say you want to fetch 10 rows after this position, and 10 rows before.

假设您要在此位置后获取10行,之前需要10行。

If the user searches for 'Smith', you could do two selects such that:

如果用户搜索“Smith”,您可以执行以下两项选择:

SELECT
  name
FROM
  companies
WHERE
  name < 'Smith'
ORDER BY
  name DESC
LIMIT 10

and then

SELECT
  name
FROM
  companies
WHERE
  name >= 'Smith'
ORDER BY
  name
LIMIT 10

You could do a UNION to fetch that in one query, the above is just simplified.

您可以在一个查询中执行UNION来获取它,上面只是简化了。

The term the user searched for would fit half way through these results. If there are any exact matches, then the first exact match will be positioned such that it is eleventh.

用户搜索的术语将适合这些结果的一半。如果有任何完全匹配,那么第一个完全匹配将被定位为第十一个。

Note that if the user searches for 'aaaaaaaa' then they'll probably just get the 10 first results with nothing before it, and for 'zzzzzzzz' they may get just the 10 last results.

请注意,如果用户搜索“aaaaaaaa”,那么他们可能只会获得前面没有任何内容的10个结果,而对于“zzzzzzzz”,他们可能只得到最后10个结果。

I'm assuming that the SQL engine in question allows >= and < comparisons between strings (and can optimise that in indexes), but I haven't tested this, maybe you can't do this. If, like MySQL, it supports internationalized collations then you could even have the ordering done correctly for non-ascii characters.

我假设有问题的SQL引擎允许> =和 <字符串之间的比较(并且可以在索引中进行优化),但我没有测试过这个,也许你不能这样做。如果像mysql一样,它支持国际化的排序规则,那么你甚至可以为非ascii字符正确地完成排序。< p>

#2


If by "the position of the search" you mean the number of the record if they were enumerated alphabetically, you may want to try something like:

如果“搜索的位置”是指按字母顺序枚举的记录编号,您可能需要尝试以下操作:

select count(*) from companies where name < 'Smith'

Most databases ought to optimize that reasonably well (but try it--theories you read on the web don't trump empirical data).

大多数数据库应该合理地优化(但尝试一下 - 你在网上阅读的理论并不能胜过经验数据)。

#3


Just to add to the ordering suggestions:

只是添加订购建议:

  • Add an index to the name if this is your standard means of data retrieval.
  • 如果这是您的标准数据检索方法,请为名称添加索引。

  • You can paginate efficiently by combining LIMIT and OFFSET.
  • 您可以通过组合LIMIT和OFFSET来有效地分页。