如果使用索引进行查询,PostgreSQL将不会使用索引进行投影

时间:2022-09-20 13:36:38

I've been running some experiments and the statement in the title seems to be true, I'd like to know if there's a way around that.

我一直在进行一些实验,标题中的陈述似乎是真的,我想知道是否有办法解决这个问题。

Here's an example:

这是一个例子:

CREATE TABLE test ( cond text, v1 integer, v2 integer, v3 integer );
-- Insert millions of rows
CREATE INDEX cond_idx ON test (cond);
CREATE INDEX values_idx ON test (v1, v2, v3);
VACUUM ANALYZE test;

Running these queries:

运行这些查询:

-- Uses Index Only Scan on values_idx for projection
SELECT sum(v1), sum(v2), sum(v3) FROM test;
-- Uses Bitmap Index Scan on cond_idx then a Bitmap Heap Scan
-- This is undesirable as it doesn't rely exclusively on indexes
SELECT sum(v1), sum(v2), sum(v3) FROM test WHERE cond = '123';

PostgreSQL can effectively combine indexes but it seems only for compound conditions, is there a way to have PostgreSQL use an index for projection after it used one or more for finding the needed rows?

PostgreSQL可以有效地组合索引,但似乎只适用于复合条件,有没有办法让PostgreSQL在使用一个或多个查找所需的行后使用投影索引?

An automatic response would be to create a single index with all 4 columns. The thing is this just a minimalistic example. In the real world scenario, the same table would be queried by different columns requiring multi-column indexes for each needed query condition.

自动响应是创建包含所有4列的单个索引。事情是这只是一个简约的例子。在现实世界场景中,相同的表将由需要针对每个所需查询条件的多列索引的不同列进行查询。

UPDATE: Changed the count to a sum to make the example more understandable. Also added more "value" columns.

更新:将计数更改为总和,以使示例更容易理解。还添加了更多“值”列。

1 个解决方案

#1


1  

The count() aggregate cannot use the index data as input because it counts 1 for every non null value: http://www.postgresql.org/docs/9.4/static/functions-aggregate.html

count()聚合不能使用索引数据作为输入,因为它为每个非null值计数1:http://www.postgresql.org/docs/9.4/static/functions-aggregate.html

count(expression): 
    number of input rows for which the value of expression is not null

On the second query, after index filtering, we don't know which row have a null value column.

在第二个查询中,在索引过滤之后,我们不知道哪一行具有空值列。

You just need to add the relevant data into your index and use a multi-column index.

您只需将相关数据添加到索引中并使用多列索引。

CREATE INDEX cond_value_idx ON test (cond, value);

This may become clear, once you read the excellent http://use-the-index-luke.com/

一旦你阅读了优秀的http://use-the-index-luke.com/,这可能会变得清晰

Here is an analogy to have a better picture of a PostgreSQL's internals. You have 1000 regular books and 2 "special" books. The 1000 books are your rows, the 2 other are your indexes.

这里有一个类比,可以更好地了解PostgreSQL的内部结构。你有1000本普通书和2本“特别”书。 1000本书是你的行,另外两本是你的索引。

One of the index book is listing every book shelf and number classified by theme, the other is listing every book shelf and number but classified by author.

其中一个索引书是列出按主题分类的每个书架和编号,另一个是列出每个书架和编号,但按作者分类。

Please note that the 1000 books are stored in a huge shelf and that the 2 index books are sitting on your desk, ready to be used.

请注意,1000本书存放在一个巨大的架子上,2本索引书放在你的桌子上,随时可以使用。

The problem is that some books are so unique that they not classified by theme (our null values).

问题是有些书是如此独特,以至于它们没有按主题分类(我们的空值)。

If you want to count books that have a theme associated, you only need to pick the "theme" index. But if you want to count every Gregory Smith's books that have a theme associated, you're going to lookup on the author index for Gregory Smith's books and then pick them up in the shelf to see if they have a theme.

如果您想要计算与主题相关联的书籍,您只需要选择“主题”索引。但是如果你想要计算每个格雷戈里史密斯的主题相关的书籍,那么你就要查看格雷戈里史密斯书籍的作者索引,然后在书架上挑选它们,看看它们是否有主题。

The solution here is a third index book that is listing every book shelf and number classified by author and then by theme. Only then you can answer the question immediately without going to the shelfs.

这里的解决方案是第三个索引书,列出了按作者分类的每个书架和编号,然后按主题分类。只有这样,你才能立即回答这个问题,而无需前往货架。

Note that the order of a multiple-column index is important as you can't answer the same question as easily with an index book classified by theme and then by author.

请注意,多列索引的顺序很重要,因为您无法使用按主题分类的索引书然后按作者轻松回答相同的问题。

#1


1  

The count() aggregate cannot use the index data as input because it counts 1 for every non null value: http://www.postgresql.org/docs/9.4/static/functions-aggregate.html

count()聚合不能使用索引数据作为输入,因为它为每个非null值计数1:http://www.postgresql.org/docs/9.4/static/functions-aggregate.html

count(expression): 
    number of input rows for which the value of expression is not null

On the second query, after index filtering, we don't know which row have a null value column.

在第二个查询中,在索引过滤之后,我们不知道哪一行具有空值列。

You just need to add the relevant data into your index and use a multi-column index.

您只需将相关数据添加到索引中并使用多列索引。

CREATE INDEX cond_value_idx ON test (cond, value);

This may become clear, once you read the excellent http://use-the-index-luke.com/

一旦你阅读了优秀的http://use-the-index-luke.com/,这可能会变得清晰

Here is an analogy to have a better picture of a PostgreSQL's internals. You have 1000 regular books and 2 "special" books. The 1000 books are your rows, the 2 other are your indexes.

这里有一个类比,可以更好地了解PostgreSQL的内部结构。你有1000本普通书和2本“特别”书。 1000本书是你的行,另外两本是你的索引。

One of the index book is listing every book shelf and number classified by theme, the other is listing every book shelf and number but classified by author.

其中一个索引书是列出按主题分类的每个书架和编号,另一个是列出每个书架和编号,但按作者分类。

Please note that the 1000 books are stored in a huge shelf and that the 2 index books are sitting on your desk, ready to be used.

请注意,1000本书存放在一个巨大的架子上,2本索引书放在你的桌子上,随时可以使用。

The problem is that some books are so unique that they not classified by theme (our null values).

问题是有些书是如此独特,以至于它们没有按主题分类(我们的空值)。

If you want to count books that have a theme associated, you only need to pick the "theme" index. But if you want to count every Gregory Smith's books that have a theme associated, you're going to lookup on the author index for Gregory Smith's books and then pick them up in the shelf to see if they have a theme.

如果您想要计算与主题相关联的书籍,您只需要选择“主题”索引。但是如果你想要计算每个格雷戈里史密斯的主题相关的书籍,那么你就要查看格雷戈里史密斯书籍的作者索引,然后在书架上挑选它们,看看它们是否有主题。

The solution here is a third index book that is listing every book shelf and number classified by author and then by theme. Only then you can answer the question immediately without going to the shelfs.

这里的解决方案是第三个索引书,列出了按作者分类的每个书架和编号,然后按主题分类。只有这样,你才能立即回答这个问题,而无需前往货架。

Note that the order of a multiple-column index is important as you can't answer the same question as easily with an index book classified by theme and then by author.

请注意,多列索引的顺序很重要,因为您无法使用按主题分类的索引书然后按作者轻松回答相同的问题。