在Sql Server中加快分隔日期间隔的分钟数

时间:2022-01-28 05:43:59

I need to divide some date intervals by minutes. (For example, 2012-01-01 10:00 - 2012-01-01 10:00 interval should be divided into 2012-01-01 10:01, 2012-01-01 10:02, ... 2012-01-01 10:10). For example, there is a table

我需要将一些日期间隔除以分钟。 (例如,2012-01-01 10:00 - 2012-01-01 10:00间隔应分为2012-01-01 10:01,2012-01-01 10:02,... 2012-01 -01 10:10)例如,有一张表

CREATE TABLE [dbo].[Events](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [EventStart] [datetime] NOT NULL,
    [EventEnd] [datetime] NOT NULL,
    [Amount] [float] NOT NULL,
 CONSTRAINT [PK_Events] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

This table is filled like

这个表填充像

DECLARE @i integer = 0;
DECLARE @initial_date datetime = '2012-01-01';

WHILE @i < 50000
BEGIN
    INSERT INTO [Events] (EventStart, EventEnd, Amount) VALUES (DATEADD(MINUTE, 10*@i, @initial_date), DATEADD(MINUTE, 10*(@i + 1), @initial_date), @i);
    SET @i = @i + 1;
END

As a result we have many 10 minutes intervals.

结果我们有10分钟的间隔。

To divide it by minutes I use the following recursive CTE:

要将它除以分钟,我使用以下递归CTE:

DECLARE @start_date datetime = '2012-01-01';
DECLARE @end_date datetime = '2013-01-02';


WITH Date_Ranges (StatDate, Amount, IntervalStart, CurrentMinute) AS (
  SELECT 
    DATEADD(MINUTE, 0,  ev.EventStart) AS StatDate, ev.Amount, ev.EventStart AS IntervalStart, 1 AS CurrentMinute
  FROM [Events] ev
  WHERE ev.EventStart BETWEEN @start_date AND @end_date
  UNION ALL
  SELECT 
    DATEADD(MINUTE, CurrentMinute, ev.EventStart), ev.Amount, ev.EventStart AS IntervalStart, CurrentMinute + 1
  FROM [Events] ev
  INNER JOIN Date_Ranges ranges ON (ranges.IntervalStart = ev.EventStart AND 
    ranges.StatDate >= ev.EventStart AND 
    ranges.StatDate < ev.EventEnd)
    WHERE DATEADD(MINUTE, CurrentMinute, ev.EventStart) BETWEEN @start_date AND @end_date AND
        ev.EventStart BETWEEN @start_date AND @end_date
) 

SELECT *
FROM Date_Ranges --ORDER BY StatDate

The main problem is too slow execution of this recursive CTE on a large data amount.

主要问题是在大数据量上执行此递归CTE的速度太慢。

So, how can I speed up this?

那么,我怎样才能加快这个速度呢?

2 个解决方案

#1


2  

This returns all 550,000 rows in roughly 1/2 the time of the recursive CTE.

这将在递归CTE的大约1/2时间内返回所有550,000行。

DECLARE @start_date datetime = '2012-01-01'; 
DECLARE @end_date datetime = '2013-01-02';

SELECT  DATEADD(MINUTE, x.number, ev.EventStart) AS StartDate, 
        ev.Amount, 
        ev.EventStart as IntervalStart, 
        x.number as CurrentMinute
FROM    master.dbo.spt_values x
CROSS JOIN Events ev
WHERE   x.type = 'P'        
AND     x.number <= DATEDIFF(MINUTE, ev.EventStart, ev.EventEnd)
AND     ev.EventStart BETWEEN @start_date and @end_date

#2


1  

I think the fastest set of buckets will be a table. Create a table of 10-minute buckets, populate it, then join on it. This completely avoids recursion, and leverages one of the things a SQL dbms is really good at--joins.

我认为最快的一组桶将是一张桌子。创建一个10分钟的桶表,填充它,然后加入它。这完全避免了递归,并利用了SQL dbms真正擅长的东西之一 - 连接。

10 years of 10-minute buckets is only a half million rows.

10年的10分钟桶只有50万行。

Recursion in a CTE is a Good Thing when you're dealing with something like a bill of materials. But it's not always a suitable substitute for a table.

当您处理类似物料清单之类的东西时,CTE中的递归是一件好事。但它并不总是适合替代桌子。


I created a table of 10-minute buckets covering 10 years. (That's about 4 megabytes of data. I didn't try to calculate how much disk indexes and row overhead took.) Then I created a table of test data containing 20 million random timestamps, all within the same 10 years as the table of buckets.

我创建了一张10分钟的桌子,覆盖了10年。 (这大约是4兆字节的数据。我没有尝试计算多少磁盘索引和行开销。)然后我创建了一个包含2000万随机时间戳的测试数据表,所有这些都在与桶表相同的10年内。

After adding indexes appropriate to the problem, the test system "buckets" one random day's data in about 100ms. (PostgreSQL dbms without tuning, running on a 5-year-old Dell computer with 1 gig of RAM. I'm on a Linux system here, so I couldn't test SQL Server itself. I'd expect similar results, though.)

在添加适合于问题的索引之后,测试系统在大约100ms内“存储”一个随机日的数据。 (没有调优的PostgreSQL dbms,运行在具有1 GB RAM的5年戴尔计算机上。我在这里的Linux系统上,所以我无法测试SQL Server本身。但是我期望得到类似的结果。 )

#1


2  

This returns all 550,000 rows in roughly 1/2 the time of the recursive CTE.

这将在递归CTE的大约1/2时间内返回所有550,000行。

DECLARE @start_date datetime = '2012-01-01'; 
DECLARE @end_date datetime = '2013-01-02';

SELECT  DATEADD(MINUTE, x.number, ev.EventStart) AS StartDate, 
        ev.Amount, 
        ev.EventStart as IntervalStart, 
        x.number as CurrentMinute
FROM    master.dbo.spt_values x
CROSS JOIN Events ev
WHERE   x.type = 'P'        
AND     x.number <= DATEDIFF(MINUTE, ev.EventStart, ev.EventEnd)
AND     ev.EventStart BETWEEN @start_date and @end_date

#2


1  

I think the fastest set of buckets will be a table. Create a table of 10-minute buckets, populate it, then join on it. This completely avoids recursion, and leverages one of the things a SQL dbms is really good at--joins.

我认为最快的一组桶将是一张桌子。创建一个10分钟的桶表,填充它,然后加入它。这完全避免了递归,并利用了SQL dbms真正擅长的东西之一 - 连接。

10 years of 10-minute buckets is only a half million rows.

10年的10分钟桶只有50万行。

Recursion in a CTE is a Good Thing when you're dealing with something like a bill of materials. But it's not always a suitable substitute for a table.

当您处理类似物料清单之类的东西时,CTE中的递归是一件好事。但它并不总是适合替代桌子。


I created a table of 10-minute buckets covering 10 years. (That's about 4 megabytes of data. I didn't try to calculate how much disk indexes and row overhead took.) Then I created a table of test data containing 20 million random timestamps, all within the same 10 years as the table of buckets.

我创建了一张10分钟的桌子,覆盖了10年。 (这大约是4兆字节的数据。我没有尝试计算多少磁盘索引和行开销。)然后我创建了一个包含2000万随机时间戳的测试数据表,所有这些都在与桶表相同的10年内。

After adding indexes appropriate to the problem, the test system "buckets" one random day's data in about 100ms. (PostgreSQL dbms without tuning, running on a 5-year-old Dell computer with 1 gig of RAM. I'm on a Linux system here, so I couldn't test SQL Server itself. I'd expect similar results, though.)

在添加适合于问题的索引之后,测试系统在大约100ms内“存储”一个随机日的数据。 (没有调优的PostgreSQL dbms,运行在具有1 GB RAM的5年戴尔计算机上。我在这里的Linux系统上,所以我无法测试SQL Server本身。但是我期望得到类似的结果。 )