使用分区表的分区索引

时间:2022-09-17 23:22:49

I'm trying to understand the optimal way to construct composite local partitioned indexes for use with partitioned tables.

我试图理解构造用于分区表的复合本地分区索引的最佳方法。

Here is my example table:

下面是我的示例表:

ADDRESS
id
street
city
state
tenant

The Address table is list partitioned upon the tenant column. Pretty much all of the queries will have the tenant column in the query, so there's really no concern for cross-partition searches here.

地址表是在承租者列上分区的列表。几乎所有的查询都将在查询中包含租户列,因此这里不需要考虑跨分区搜索。

I want to make a query like select * from address where tenant = 'X' and street = 'Y' and city = 'Z' perform as optimally as possible, in the end. To me, it seems like the right way for that to go would be to first limit to the particular tenant (partition) and then use the local partitioned index.

我想做一个查询,比如select * from address,其中承租人= 'X'和street = 'Y'和city = 'Z'尽其所能地发挥最佳性能。对我来说,正确的做法似乎是首先限制特定的租户(分区),然后使用本地分区索引。

Now, I believe that only one index can be used per reference table, so I want to make a composite local partitioned index that will be most useful. I envision the composite index having street and city in it. So I have two questions:

现在,我相信每个引用表只能使用一个索引,所以我想创建一个最有用的组合本地分区索引。我设想综合指数中有街道和城市。我有两个问题

  1. Should tenant have an index by itself?

    租客应该自己有索引吗?

  2. Should the tenant be part of the composite index?

    承租人是否应该成为复合索引的一部分?

Some understanding behind why it should be on way or another would be helpful as I don't think I fully understand how the partitions work with the partitioned indexes.

理解为什么它应该在路上或者其他地方会很有帮助,因为我不认为我完全理解分区是如何使用分区索引工作的。

2 个解决方案

#1


1  

create index address_city_street_idx on address(city, street) compress 1 local;

I believe that index is ideal for this query, given a table that is list -partitioned on TENANT:

我认为索引对于这个查询是理想的,因为在租户上有一个列表分区的表:

select * from address where tenant = 'X' and street = 'Y' and city = 'Z' 

To answer questions 1 and 2: Since TENANT is the partition key it should not be in this index, and probably should not be in any index. That column is already used by the partition pruning to select the relevant segment. That work is done at compile or parse time, and is virtually free.

要回答问题1和2:由于租户是分区键,所以它不应该在这个索引中,而且可能不应该在任何索引中。分区修剪已经使用该列来选择相关的段。该工作是在编译或解析时完成的,实际上是免费的。

The execution plans in the test case demonstrate that partition pruning is happening. The operation PARTITION LIST SINGLE and the fact that the columns Pstart and Pstop list the number 3, instead of a variable like KEY, show that Oracle has already determined the partition before the query has run. Oracle is instantly discarding irrelevant TENANTs at compile time, there's no need to worry about further reducing the TENANTs at run time with an index.

测试用例中的执行计划表明正在进行分区修剪。操作分区列表单一,列Pstart和Pstop列出数字3,而不是像KEY这样的变量,这表明Oracle在查询运行之前已经确定了分区。Oracle在编译时立即丢弃不相关的租户,无需担心在运行时使用索引进一步减少租户。


My index suggestion depends on a few assumptions about the data. Neither CITY nor STREET sound like they would uniquely identify a row for a tenant. And STREET sounds much more selective than CITY. If a single CITY has multiple STREETs then indexing them in that order and using index compression can save a lot of space.

我的索引建议取决于对数据的一些假设。无论是城市还是街道,听起来都不像是唯一为租客识别的一排。街道听起来比城市更有选择性。如果一个城市有多个街道,那么按这个顺序索引它们,使用索引压缩可以节省很多空间。

If the index is significantly smaller it may have less levels, which means it would require slightly fewer I/Os for a lookup. And if it's smaller more of it could fit in the buffer cache, which might further improve performance.

如果索引显著地小,那么它的级别可能会更低,这意味着查找时需要的I/Os稍微少一点。如果它更小的话,它就能适应缓冲区缓存,这可能会进一步提高性能。

But with a table this large, I have a feeling the BLEVEL (number of index levels) will be the same for both, and both indexes will be too large to use cache effectively. Which means there may not be any performance difference between (CITY,STREET) and (STREET,CITY). But with (CITY,STREET) and compression you may at least be able to save a large amount of space.

但是对于这么大的表,我感觉BLEVEL(索引级别的数量)对于两个都是相同的,而且两个索引都太大,无法有效地使用缓存。这意味着(城市、街道)和(街道、城市)之间可能没有任何性能差异。但是通过(城市、街道)和压缩,你至少可以节省大量空间。

Test Case

测试用例

I assume you cannot simply create both indexes on production and try them out. In that case you'll want to create some tests first.

我假设您不能简单地在产品上创建两个索引并尝试它们。在这种情况下,您需要首先创建一些测试。

This test case does not strongly support my suggestion. It is merely a starting point for a more thorough test case. You'll need to create one with a larger amount of data and a more realistic data distribution.

这个测试用例并没有强烈支持我的建议。它只是更全面的测试用例的起点。您需要创建一个具有更大数据量和更真实数据分布的系统。

--Create sample table.
create table address
(
    id number,
    street varchar2(100),
    city varchar2(100),
    state varchar2(100),
    tenant varchar2(100)
) partition by list (tenant)
(
    partition p1 values ('tenant1'),
    partition p2 values ('tenant2'),
    partition p3 values ('tenant3'),
    partition p4 values ('tenant4'),
    partition p5 values ('tenant5')
) nologging;

--Insert 5M rows.
--Note the assumptions about the selectivity of the street and city
--are critical to this issue.  Adjust the MOD as necessary.
begin
    for i in 1 .. 5 loop
        insert /*+ append */ into address
        select
            level,
            'Fake Street '||mod(level, 10000),
            'City '||mod(level, 100),
            'State',
            'tenant'||i
        from dual connect by level <= 1000000;
        commit;
    end loop;
end;
/

--Table uses 282MB.
select sum(bytes)/1024/1024 mb from dba_segments where segment_name = 'ADDRESS' and owner = user;

--Create different indexes.
create index address_city_street_idx on address(city, street) compress 1 local;
create index address_street_city_idx on address(street, city) local;

--Gather statistics.
begin
    dbms_stats.gather_table_stats(user, 'ADDRESS');
end;
/

--Check execution plan.
--Oracle by default picks STREET,CITY over CITY,STREET.
--I'm not sure why.  And the cost difference is only 1, so I think things may be different with realistic data.
explain plan for select * from address where tenant = 'tenant3' and street = 'Fake Street 50' and city = 'City 50';
select * from table(dbms_xplan.display);

/*
Plan hash value: 2845844304

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                         |     1 |    44 |     4   (0)| 00:00:01 |       |       |
|   1 |  PARTITION LIST SINGLE                     |                         |     1 |    44 |     4   (0)| 00:00:01 |     3 |     3 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| ADDRESS                 |     1 |    44 |     4   (0)| 00:00:01 |     3 |     3 |
|*  3 |    INDEX RANGE SCAN                        | ADDRESS_STREET_CITY_IDX |     1 |       |     3   (0)| 00:00:01 |     3 |     3 |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("STREET"='Fake Street 50' AND "CITY"='City 50')
*/

--Check execution plan of forced CITY,STREET index.
--I don't suggest using a hint in the real query, this is just to compare plans.
explain plan for select /*+ index(address address_city_street_idx) */ * from address where tenant = 'tenant3' and street = 'Fake Street 50' and city = 'City 50';
select * from table(dbms_xplan.display);

/*
Plan hash value: 1084849450

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                         |     1 |    44 |     5   (0)| 00:00:01 |       |       |
|   1 |  PARTITION LIST SINGLE                     |                         |     1 |    44 |     5   (0)| 00:00:01 |     3 |     3 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| ADDRESS                 |     1 |    44 |     5   (0)| 00:00:01 |     3 |     3 |
|*  3 |    INDEX RANGE SCAN                        | ADDRESS_CITY_STREET_IDX |     1 |       |     3   (0)| 00:00:01 |     3 |     3 |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CITY"='City 50' AND "STREET"='Fake Street 50')
*/

--Both indexes have BLEVEL=2.
select *
from dba_indexes
where index_name in ('ADDRESS_CITY_STREET_IDX', 'ADDRESS_STREET_CITY_IDX');

--CITY,STREET = 160MB, STREET,CITY=200MB.
--You can see the difference already.  It may get larger with different data distribution.
--And it may get larger with more data, as it may compress better with more repetition.
select segment_name, sum(bytes)/1024/1024 mb
from dba_segments
where segment_name in ('ADDRESS_CITY_STREET_IDX', 'ADDRESS_STREET_CITY_IDX')
group by segment_name;

#2


1  

If index unique then you have to include TENANT to make it local. If it is not unique then do not include it as it will not improve any performance in case of LIST/RANGE partition. You can consider to include it if it is hash partition with many distinct values in one partition.

如果索引是唯一的,那么您必须包含租户使其成为本地的。如果它不是唯一的,那么不要包含它,因为在列表/范围分区的情况下,它不会提高任何性能。如果它是一个具有多个不同值的散列分区,您可以考虑将它包含进来。

UPD: However it depends what kind of partitioning you're using - "static" or "dynamic". "Static" is when all partitions are defined once in create table statement and stay unchanged while application is running. "Dynamic" is when application adds/change partitions (like daily process adds daily list partitions for all tables and etc).

UPD:但是这取决于您使用的是哪种分区——“静态”还是“动态”。“静态”是指在create table语句中定义一次所有分区,并在应用程序运行时保持不变。“动态”是指应用程序添加/更改分区(如每日进程为所有表添加每日列表分区等)。

So you should avoid global index for "dynamic" partitioning - in this case it will become invalid every time when you add new partition. For "static" option it is ok to use global index if you sometimes need to scan across all partitions.

因此,您应该避免“动态”分区的全局索引——在这种情况下,每当添加新分区时,它将变得无效。对于“静态”选项,如果有时需要扫描所有分区,可以使用全局索引。

#1


1  

create index address_city_street_idx on address(city, street) compress 1 local;

I believe that index is ideal for this query, given a table that is list -partitioned on TENANT:

我认为索引对于这个查询是理想的,因为在租户上有一个列表分区的表:

select * from address where tenant = 'X' and street = 'Y' and city = 'Z' 

To answer questions 1 and 2: Since TENANT is the partition key it should not be in this index, and probably should not be in any index. That column is already used by the partition pruning to select the relevant segment. That work is done at compile or parse time, and is virtually free.

要回答问题1和2:由于租户是分区键,所以它不应该在这个索引中,而且可能不应该在任何索引中。分区修剪已经使用该列来选择相关的段。该工作是在编译或解析时完成的,实际上是免费的。

The execution plans in the test case demonstrate that partition pruning is happening. The operation PARTITION LIST SINGLE and the fact that the columns Pstart and Pstop list the number 3, instead of a variable like KEY, show that Oracle has already determined the partition before the query has run. Oracle is instantly discarding irrelevant TENANTs at compile time, there's no need to worry about further reducing the TENANTs at run time with an index.

测试用例中的执行计划表明正在进行分区修剪。操作分区列表单一,列Pstart和Pstop列出数字3,而不是像KEY这样的变量,这表明Oracle在查询运行之前已经确定了分区。Oracle在编译时立即丢弃不相关的租户,无需担心在运行时使用索引进一步减少租户。


My index suggestion depends on a few assumptions about the data. Neither CITY nor STREET sound like they would uniquely identify a row for a tenant. And STREET sounds much more selective than CITY. If a single CITY has multiple STREETs then indexing them in that order and using index compression can save a lot of space.

我的索引建议取决于对数据的一些假设。无论是城市还是街道,听起来都不像是唯一为租客识别的一排。街道听起来比城市更有选择性。如果一个城市有多个街道,那么按这个顺序索引它们,使用索引压缩可以节省很多空间。

If the index is significantly smaller it may have less levels, which means it would require slightly fewer I/Os for a lookup. And if it's smaller more of it could fit in the buffer cache, which might further improve performance.

如果索引显著地小,那么它的级别可能会更低,这意味着查找时需要的I/Os稍微少一点。如果它更小的话,它就能适应缓冲区缓存,这可能会进一步提高性能。

But with a table this large, I have a feeling the BLEVEL (number of index levels) will be the same for both, and both indexes will be too large to use cache effectively. Which means there may not be any performance difference between (CITY,STREET) and (STREET,CITY). But with (CITY,STREET) and compression you may at least be able to save a large amount of space.

但是对于这么大的表,我感觉BLEVEL(索引级别的数量)对于两个都是相同的,而且两个索引都太大,无法有效地使用缓存。这意味着(城市、街道)和(街道、城市)之间可能没有任何性能差异。但是通过(城市、街道)和压缩,你至少可以节省大量空间。

Test Case

测试用例

I assume you cannot simply create both indexes on production and try them out. In that case you'll want to create some tests first.

我假设您不能简单地在产品上创建两个索引并尝试它们。在这种情况下,您需要首先创建一些测试。

This test case does not strongly support my suggestion. It is merely a starting point for a more thorough test case. You'll need to create one with a larger amount of data and a more realistic data distribution.

这个测试用例并没有强烈支持我的建议。它只是更全面的测试用例的起点。您需要创建一个具有更大数据量和更真实数据分布的系统。

--Create sample table.
create table address
(
    id number,
    street varchar2(100),
    city varchar2(100),
    state varchar2(100),
    tenant varchar2(100)
) partition by list (tenant)
(
    partition p1 values ('tenant1'),
    partition p2 values ('tenant2'),
    partition p3 values ('tenant3'),
    partition p4 values ('tenant4'),
    partition p5 values ('tenant5')
) nologging;

--Insert 5M rows.
--Note the assumptions about the selectivity of the street and city
--are critical to this issue.  Adjust the MOD as necessary.
begin
    for i in 1 .. 5 loop
        insert /*+ append */ into address
        select
            level,
            'Fake Street '||mod(level, 10000),
            'City '||mod(level, 100),
            'State',
            'tenant'||i
        from dual connect by level <= 1000000;
        commit;
    end loop;
end;
/

--Table uses 282MB.
select sum(bytes)/1024/1024 mb from dba_segments where segment_name = 'ADDRESS' and owner = user;

--Create different indexes.
create index address_city_street_idx on address(city, street) compress 1 local;
create index address_street_city_idx on address(street, city) local;

--Gather statistics.
begin
    dbms_stats.gather_table_stats(user, 'ADDRESS');
end;
/

--Check execution plan.
--Oracle by default picks STREET,CITY over CITY,STREET.
--I'm not sure why.  And the cost difference is only 1, so I think things may be different with realistic data.
explain plan for select * from address where tenant = 'tenant3' and street = 'Fake Street 50' and city = 'City 50';
select * from table(dbms_xplan.display);

/*
Plan hash value: 2845844304

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                         |     1 |    44 |     4   (0)| 00:00:01 |       |       |
|   1 |  PARTITION LIST SINGLE                     |                         |     1 |    44 |     4   (0)| 00:00:01 |     3 |     3 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| ADDRESS                 |     1 |    44 |     4   (0)| 00:00:01 |     3 |     3 |
|*  3 |    INDEX RANGE SCAN                        | ADDRESS_STREET_CITY_IDX |     1 |       |     3   (0)| 00:00:01 |     3 |     3 |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("STREET"='Fake Street 50' AND "CITY"='City 50')
*/

--Check execution plan of forced CITY,STREET index.
--I don't suggest using a hint in the real query, this is just to compare plans.
explain plan for select /*+ index(address address_city_street_idx) */ * from address where tenant = 'tenant3' and street = 'Fake Street 50' and city = 'City 50';
select * from table(dbms_xplan.display);

/*
Plan hash value: 1084849450

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                         |     1 |    44 |     5   (0)| 00:00:01 |       |       |
|   1 |  PARTITION LIST SINGLE                     |                         |     1 |    44 |     5   (0)| 00:00:01 |     3 |     3 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| ADDRESS                 |     1 |    44 |     5   (0)| 00:00:01 |     3 |     3 |
|*  3 |    INDEX RANGE SCAN                        | ADDRESS_CITY_STREET_IDX |     1 |       |     3   (0)| 00:00:01 |     3 |     3 |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CITY"='City 50' AND "STREET"='Fake Street 50')
*/

--Both indexes have BLEVEL=2.
select *
from dba_indexes
where index_name in ('ADDRESS_CITY_STREET_IDX', 'ADDRESS_STREET_CITY_IDX');

--CITY,STREET = 160MB, STREET,CITY=200MB.
--You can see the difference already.  It may get larger with different data distribution.
--And it may get larger with more data, as it may compress better with more repetition.
select segment_name, sum(bytes)/1024/1024 mb
from dba_segments
where segment_name in ('ADDRESS_CITY_STREET_IDX', 'ADDRESS_STREET_CITY_IDX')
group by segment_name;

#2


1  

If index unique then you have to include TENANT to make it local. If it is not unique then do not include it as it will not improve any performance in case of LIST/RANGE partition. You can consider to include it if it is hash partition with many distinct values in one partition.

如果索引是唯一的,那么您必须包含租户使其成为本地的。如果它不是唯一的,那么不要包含它,因为在列表/范围分区的情况下,它不会提高任何性能。如果它是一个具有多个不同值的散列分区,您可以考虑将它包含进来。

UPD: However it depends what kind of partitioning you're using - "static" or "dynamic". "Static" is when all partitions are defined once in create table statement and stay unchanged while application is running. "Dynamic" is when application adds/change partitions (like daily process adds daily list partitions for all tables and etc).

UPD:但是这取决于您使用的是哪种分区——“静态”还是“动态”。“静态”是指在create table语句中定义一次所有分区,并在应用程序运行时保持不变。“动态”是指应用程序添加/更改分区(如每日进程为所有表添加每日列表分区等)。

So you should avoid global index for "dynamic" partitioning - in this case it will become invalid every time when you add new partition. For "static" option it is ok to use global index if you sometimes need to scan across all partitions.

因此,您应该避免“动态”分区的全局索引——在这种情况下,每当添加新分区时,它将变得无效。对于“静态”选项,如果有时需要扫描所有分区,可以使用全局索引。