SQL Server性能和索引视图

时间:2022-07-29 07:13:53

Using SQL Server 2008.

使用SQL Server 2008。

(Sorry if this turns out to be an article but I'm trying to give as much info as possible.)

(对不起,如果这是一篇文章,但我想尽可能多地提供信息。)

I have multiple locations which each contain multiple departments which each contain multiple Items which can have zero to many scans. Each scan relates to a specific operation which may or may not have a cutoff time. Each item also belongs to a specific package which belongs to a specific job which belongs to a specific project with belongs to a specific client. Each job contains one or more packages which contains one or more items.

我有多个位置,每个位置包含多个部门,每个部门包含多个可以具有零到多扫描的项目。每次扫描涉及可能具有或不具有截止时间的特定操作。每个项目也属于特定包,该包属于属于特定客户的特定项目的特定作业。每个作业包含一个或多个包含一个或多个项目的包。

                                        +=============+     +=============+
                                        |   Projects  | --> |   Clients   |
                                        +=============+     +=============+
                                              ^
                                              |
+=============+                         +=============+
|  Locations  |                         |     Jobs    |
+=============+                         +=============+
      ^                                       ^
      |                                       |
+=============+     +=============+     +=============+
| Departments | <-- |    Items    | --> |   Packages  |
+=============+     +=============+     +=============+
                          ^
                          |
                    +=============+     +=============+
                    |    Scans    | --> | Operations  |
                    +=============+     +=============+

There are roughly 24,000,000 records in the items table and roughly 48,000,000 records in the scans table. New items are sporadically bulk inserted into the database throughout the day, usually in the tens of thousands at a pop. New scans are bulk inserted every hour, anywhere from a few hundred to a few hundred thousand per.

items表中大约有24,000,000条记录,扫描表中大约有48,000,000条记录。新项目全天偶尔会大量插入数据库,通常是成千上万的流行音乐。每小时批量插入新扫描,每个扫描数量从几百到几十万。

These tables are heavily queried, sliced and diced every which way. I was writing very specific stored procs but it turned into a maintenance nightmare as I was on the verge of a hundred stored procs with no end in site (e.g. something akin to ScansGetDistinctCountByProjectIDByDepartmentIDGroupedByLocationID, ScansGetDistinctCountByPackageIDByDepartmentIDGroupedByLocationID, etc.) As luck would have it, the requirements change (what feels like) almost daily and every time I have to change/add/delete a column, well...I end up at the bar.

这些表格在每个方向都经过严格查询,切片和切块。我正在编写非常具体的存储过程,但它变成了一个维护噩梦,因为我处于一百个存储过程的边缘,在网站中没有结束(例如类似于ScansGetDistinctCountByProjectIDByDepartmentIDGroupedByLocationID,ScansGetDistinctCountByPackageIDByDepartmentIDGroupedByLocationID等等)。幸运的是,它的要求几乎每天都有变化(感觉像是什么),每当我必须更改/添加/删除一个列时,嗯...我最终会在酒吧。

So I created an indexed view and a handful of generic stored procs with parameters to determine filtering and grouping. Unfortunately, performance went down the toilet. I guess the first question is, since select performance is paramount, should I just stick with the specific approach and fight through changes to the underlying tables? Or, can something be done to speed up the indexed view/generic query approach? On top of easing the maintenance nightmare, I was actually hoping that the indexed view would improve performance as well.

所以我创建了一个索引视图和一些带有参数的通用存储过程来确定过滤和分组。不幸的是,性能下降了。我想第一个问题是,因为选择性能是最重要的,我应该坚持使用特定方法并通过对基础表的更改进行斗争吗?或者,是否可以采取措施加快索引视图/通用查询方法?除了缓解维护噩梦之外,我实际上希望索引视图也能提高性能。

Here is the code to generate the view:

以下是生成视图的代码:

CREATE VIEW [ItemScans] WITH SCHEMABINDING AS

SELECT
    p.ClientID          
    , p.ID        AS [ProjectID]            
    , j.ID        AS [JobID]
    , pkg.ID      AS [PackageID]
    , i.ID        AS [ItemID]       
    , s.ID        AS [ScanID]
    , s.DateTime
    , o.Code
    , o.Cutoff
    , d.ID        AS [DepartmentID]
    , d.LocationID
    -- other columns
FROM
    [Projects] AS p
    INNER JOIN [Jobs] AS j
        ON p.ID = j.ProjectID
    INNER JOIN [Packages] AS pkg
        ON j.ID = pkg.JobID
    INNER JOIN [Items] AS i
        ON pkg.ID = i.PackageID
    INNER JOIN [Scans] AS s
        ON i.ID = s.ItemID
    INNER JOIN [Operations] AS o
        ON s.OperationID = o.ID
    INNER JOIN [Departments] AS d
        ON i.DepartmentID = d.ID;   

and the clustered index:

和聚集索引:

CREATE UNIQUE CLUSTERED INDEX [IDX_ItemScans] ON [ItemScans]
(
    [PackageID] ASC,
    [ItemID] ASC,
    [ScanID] ASC
)

Here's one of the generic stored procs. It gets a count of items that have been scanned and have a cutoff:

这是一个通用的存储过程。它会计算已扫描并具有截止值的项目:

PROCEDURE [ItemsGetFinalizedCount] 
    @FilterBy       int = NULL
    , @ID           int = NULL
    , @FilterBy2    int = NULL 
    , @ID2          sql_variant = NULL  
    , @GroupBy      int = NULL        
WITH RECOMPILE
AS
BEGIN

    SELECT
        CASE @GroupBy           
            WHEN 1 THEN
                CONVERT(sql_variant, LocationID)
            WHEN 2 THEN
                CONVERT(sql_variant, DepartmentID)
            -- other cases
       END AS [ID]
       , COUNT(DISTINCT ItemID) AS [COUNT]
    FROM
        [ItemScans] WITH (NOEXPAND)
    WHERE       
        (@ID IS NULL OR
        @ID = CASE @FilterBy            
            WHEN 1 THEN         
                ClientID
            WHEN 2 THEN
                ProjectID
            -- other cases
        END) 
        AND (@ID2 IS NULL OR
        @ID2 = CASE @FilterBy2          
            WHEN 1 THEN         
                CONVERT(sql_variant, ClientID)
            WHEN 2 THEN
                CONVERT(sql_variant, ProjectID)
            -- other cases
        END)
        AND Cutoff IS NOT NULL
    GROUP BY
        CASE @GroupBy           
            WHEN 1 THEN
                CONVERT(sql_variant, LocationID) 
            WHEN 2 THEN
                CONVERT(sql_variant, DepartmentID)
            -- other cases
        END
END

The first time I ran the query and looked at the actual execution plan, I created the missing index that it suggested:

我第一次运行查询并查看实际执行计划时,创建了它建议的缺失索引:

CREATE NONCLUSTERED INDEX [IX_ItemScans_Counts] ON [ItemScans]
(
    [Cutoff] ASC
)
INCLUDE ([ClientID],[ProjectID],[JobID],[ItemID],[SegmentID],[DepartmentID],[LocationID]) 

Creating the index took the execution time down to about five seconds but that is still unacceptable (the "specific" version of the query runs subsecond.) I've tried adding different columns to the index instead of just including them with no gain in performance (doesn't really help that I have no idea what I am doing at this point.)

创建索引会将执行时间缩短到大约五秒,但这仍然是不可接受的(查询的“特定”版本在亚秒运行。)我尝试在索引中添加不同的列而不是仅包含它们而不会提高性能(并不能真正帮助我不知道我现在在做什么。)

Here is the query plan:

这是查询计划:

SQL Server性能和索引视图

And here are the details for that first index seek (it appears to return all of the rows in the view where Cutoff IS NOT NULL):

以下是第一个索引搜索的详细信息(它似乎返回视图中Cutoff IS NOT NULL的所有行):

SQL Server性能和索引视图

1 个解决方案

#1


3  

A generic proc may not be a bad idea in this case, but you don't have to put all those cases into the final query as you are currently doing. I would try building your "specific queries" using dynamic SQL in your generic proc, much the same way Gail Shaw builds a "catch-all" query here:

在这种情况下,通用过程可能不是一个坏主意,但您不必像目前那样将所有这些案例放入最终查询中。我会尝试在你的泛型proc中使用动态SQL构建你的“特定查询”,就像Gail Shaw在这里建立一个“catch-all”查询一样:

SQL in the Wild - Catch-all queries

野外的SQL - Catch-all查询

This way, you're able to cache query plans and utilize indexes as shown in the blog post, and you should be able to get that same sub-second performance that you're after.

这样,您就可以缓存查询计划并利用博客文章中显示的索引,您应该能够获得与之相同的亚秒级性能。

#1


3  

A generic proc may not be a bad idea in this case, but you don't have to put all those cases into the final query as you are currently doing. I would try building your "specific queries" using dynamic SQL in your generic proc, much the same way Gail Shaw builds a "catch-all" query here:

在这种情况下,通用过程可能不是一个坏主意,但您不必像目前那样将所有这些案例放入最终查询中。我会尝试在你的泛型proc中使用动态SQL构建你的“特定查询”,就像Gail Shaw在这里建立一个“catch-all”查询一样:

SQL in the Wild - Catch-all queries

野外的SQL - Catch-all查询

This way, you're able to cache query plans and utilize indexes as shown in the blog post, and you should be able to get that same sub-second performance that you're after.

这样,您就可以缓存查询计划并利用博客文章中显示的索引,您应该能够获得与之相同的亚秒级性能。