SQLite3是否支持索引？如何加快查询速度？

I am executing this query:

我正在执行此查询:

NSString *querySQL = [NSString stringWithFormat:@"
        SELECT DISTINCT P1.ID_RUTA_PARADAS
        FROM FastParadas AS P1
        WHERE P1.ID_ESTACION_INIT <= %d AND
            %d <= P1.ID_ESTACION_END
        INTERSECT
        SELECT DISTINCT P2.ID_RUTA_PARADAS
        FROM FastParadas AS P2
        WHERE P2.ID_ESTACION_INIT <= %d AND
            %d <= P2.ID_ESTACION_END",
    (int)estacionOrigen.ID_Estacion,(int)estacionOrigen.ID_Estacion,
    (int)estacionDestino.ID_Estacion,(int)estacionDestino.ID_Estacion];

And I want to speed it up. I tried by creating some indexes but there is no improvement. Does SQLite3 supports indexes?

我想加快速度。我尝试创建一些索引,但没有任何改进。 SQLite3是否支持索引?

The database has 3900+ rows, and this query has to be repeated 1800+ times in less than a second.

该数据库有3900多行,此查询必须在不到一秒的时间内重复1800次以上。

3 个解决方案

#1

The database has 3900+ rows, and this query has to be repeated 1800+ times in less than a second.

该数据库有3900多行,此查询必须在不到一秒的时间内重复1800次以上。

No. Not going to happen outside of a machine with fantastically HUGE memory bandwidth using a highly optimized algorithm that scans the data in memory.

不会。使用高度优化的扫描内存中数据的算法,不会在具有巨大内存带宽的机器外发生。

In any situation like this, it is critical that you design that data model such that this kind of query simply isn't necessary. 3900+ rows is really not that much, but 1800+ queries against that data is a hell of a lot.

在任何这种情况下,您必须设计该数据模型,以便不需要这种查询。 3900多行确实不是那么多,但针对该数据的1800多个查询是一个很大的问题。

Your best bet is to pursue a schema that eliminates the need for the 1800+ queries/second or, worst case, design the app such that the 1800+ queries/second is done behind a progress bar or something.

你最好的选择是追求一种模式,消除1800 /秒以上的查询需求,或者最坏的情况是设计应用程序,以便在进度条或其他东西后面完成1800多个查询/秒。

#2

Besides the points from @bbum and @ipmcc regarding physical limtations, you won't have much luck with indexes in theory, too. What you are looking for is the ID_RUTA_PARADAS entry of all tuples that satisfy ID_ESTACION_INIT smaller than some value and ID_ESTACION_END bigger than some value (Just to put that into natural language).

除了@bbum和@ipmcc关于物理限制的要点之外,你在理论上也不会有太多的运气。您正在寻找的是所有元组的ID_RUTA_PARADAS条目,它们满足ID_ESTACION_INIT小于某个值,ID_ESTACION_END大于某个值(只是将其置于自然语言中)。

What could an index help with that?

索引有什么用呢?

(1) Say you have an index on ID_ESTACION_INIT that supports range queries. You could get all ids for the rows satisfying ID_ESTACION_INIT <= %d relatively fast. But then you have to get all those rows in order to find out if they also satisfy %d <= P1.ID_ESTACION_END.

(1)假设您在ID_ESTACION_INIT上有一个支持范围查询的索引。您可以相对快速地获得满足ID_ESTACION_INIT <=%d的行的所有ID。但是你必须得到所有这些行,以便找出它们是否也满足%d <= P1.ID_ESTACION_END。

(2) Say you have an index on ID_ESTACION_INIT and one on ID_ESTACION_END both supporting range queries. Then these both could get all rows satisfying the predicates, and the rowids that are returned by both indexes could be used for fetching the ID_RUTA_PARADA.

(2)假设您在ID_ESTACION_INIT上有一个索引,在ID_ESTACION_END上有一个支持范围查询。然后这两个都可以获得满足谓词的所有行,并且两个索引返回的rowid可以用于获取ID_RUTA_PARADA。

The problem with both of these approaches is, that if you want to work with them, you would have to do random access to disk which makes only sense for small result sets (i.e. if there are few rows that satisfy those predicates). For bigger cardinalities (I think I heard of >= 5%, but that might also have been just an example) your database system would go for a tablescan in order to find all tuples which means, your index does not help.

这两种方法的问题在于,如果你想使用它们,你将不得不对磁盘进行随机访问,这对于小结果集只有意义(即,如果几行满足那些谓词)。对于更大的基数(我想我听说> = 5%,但这也可能只是一个例子)你的数据库系统会去一个表扫描,以便找到所有元组,这意味着,你的索引没有帮助。

Here a SQLFiddle to play around with indexes and maybe also other DBMSs: http://sqlfiddle.com/#!5/d1a86/2

这里有一个SQLFiddle来处理索引,也许还有其他DBMS:http://sqlfiddle.com/#!5/d1a86/2

(In fact, a clustered index could help for reading less not-qualifying tuples, but SQLite does not support them: sqlite: Fastest way to get all rows (consecutive disk access) )

(实际上,聚簇索引可以帮助读取不太符合条件的元组,但SQLite不支持它们:sqlite:获取所有行的最快方法(连续磁盘访问))

#3

In this query, the INTERSECT already takes care of removing duplicates, so you don't need the DISTINCT. The following query might be even faster:

在此查询中,INTERSECT已经负责删除重复项,因此您不需要DISTINCT。以下查询可能更快:

SELECT DISTINCT ID_RUTA_PARADAS
FROM FastParadas
WHERE %d BETWEEN ID_ESTACION_INIT AND ID_ESTACION_END
  AND %d BETWEEN ID_ESTACION_INIT AND ID_ESTACION_END

However, a range query like this cannot be easily optimized with normal indexes. You should change your database to use a one-dimensional R-tree index, in which case 1800 queries/s might be possible.

但是,使用普通索引无法轻松优化此类范围查询。您应该更改数据库以使用一维R树索引,在这种情况下可能有1800个查询/秒。

#1