多个日期范围之间的分钟总和

The scenario is users specify when they are available, these specified times can overlap each other. I'm trying to get the total time they are available for. Example with SQL Fiddle:

该方案是用户指定何时可用,这些指定时间可以相互重叠。我正在努力争取他们可用的总时间。 SQL小提琴示例:

--Available--
ID  userID  availStart          availEnd
1   456     '2012-11-19 16:00'  '2012-11-19 17:00'
2   456     '2012-11-19 16:00'  '2012-11-19 16:50'
3   456     '2012-11-19 18:00'  '2012-11-19 18:30'
4   456     '2012-11-19 17:30'  '2012-11-19 18:10'
5   456     '2012-11-19 16:00'  '2012-11-19 17:10'
6   456     '2012-11-19 16:00'  '2012-11-19 16:50'

The output should be 130 minutes:

输出应为130分钟:

1: 60
2: 0 as falls inside 1
3: 30
4: 30 as the last 10 mins is covered by 3
5: 10 as first 60 mins is covered by 1
6: 0 as falls inside 1

I can get the total overlapping minutes, however this is more than the SUM of the available minutes:

我可以得到总重叠分钟,但这超过了可用分钟的总和:

SQL Fiddle

Any ideas how I can achieve this?

我有什么想法可以做到这一点?

EDIT 21st Nov 12: Thanks so far for everyone's solutions - in a way I'm happy to see this wasn't an 'easy' query to write.

编辑11月21日:感谢所有人的解决方案 - 在某种程度上,我很高兴看到这不是一个'简单'的查询。

EDIT 23rd Nov 12: This is all great work. Internally here we're thinking it might be best to ensure users cannot enter overlapping times (eg forcing them to amend an existing entry)!

编辑11月23日12:这都是伟大的工作。在内部,我们认为最好确保用户不能输入重叠时间(例如强制他们修改现有条目)!

5 个解决方案

#1

Gordon Linoff has a CTE based answer

Gordon Linoff有一个基于CTE的答案

I've done some performance analysis on all the working algorithms Blank values mean it took too long. This is tested on a single Core i7 X920 @2GHz chip, backed by a couple of SSDs. The only index created was a cluster on UserID, AvailStart. If you think you can improve any of the performance, let me know.

我已经对所有工作算法进行了一些性能分析。空白值意味着花了太长时间。这是在单个Core i7 X920 @ 2GHz芯片上测试的,由几个SSD支持。创建的唯一索引是UserID上的集群,AvailStart。如果您认为可以改善任何表现,请告诉我。

This CTE version was worse than linear, SQL Server can't do the RN = RN + 1 join in an efficient way. I rectified this with a hybrid approach below, where I save and index the first CTE into a table variable. This still takes ten times as much IO as the cursor based approach.

这个CTE版本比线性更差,SQL Server无法以有效的方式进行RN = RN + 1连接。我通过下面的混合方法对此进行了纠正,在此我将第一个CTE保存并索引到表变量中。这仍然是基于光标的方法的IO的十倍。

With OrderedRanges as (
  Select
    Row_Number() Over (Partition By UserID Order By AvailStart) AS RN,
    AvailStart,
    AvailEnd
  From
    dbo.Available
  Where
    UserID = 456
),
AccumulateMinutes (RN, Accum, CurStart, CurEnd) as (
  Select
    RN, 0, AvailStart, AvailEnd
  From
    OrderedRanges
  Where 
    RN = 1
  Union All
  Select
    o.RN, 
    a.Accum + Case When o.AvailStart <= a.CurEnd Then
        0
      Else 
        DateDiff(Minute, a.CurStart, a.CurEnd)
      End,
    Case When o.AvailStart <= a.CurEnd Then 
        a.CurStart
      Else
        o.AvailStart
      End,
    Case When o.AvailStart <= a.CurEnd Then
        Case When a.CurEnd > o.AvailEnd Then a.CurEnd Else o.AvailEnd End
      Else
        o.AvailEnd
      End
  From
    AccumulateMinutes a
        Inner Join 
    OrderedRanges o On 
        a.RN = o.RN - 1
)

Select Max(Accum + datediff(Minute, CurStart, CurEnd)) From AccumulateMinutes

http://sqlfiddle.com/#!6/ac021/2

After doing some performance analysis, here's a hybrid CTE/table variable version that performs better than anything except the cursor based approach

在进行了一些性能分析之后,这里是一个混合CTE /表变量版本,除了基于游标的方法之外,它的性能优于任何其他版本

Create Function dbo.AvailMinutesHybrid(@UserID int) Returns Int As
Begin

Declare @UserRanges Table (
  RN int not null primary key, 
  AvailStart datetime, 
  AvailEnd datetime
)
Declare @Ret int = Null

;With OrderedRanges as (
  Select
    Row_Number() Over (Partition By UserID Order By AvailStart) AS RN,
    AvailStart,
    AvailEnd
  From
    dbo.Available
  Where
    UserID = @UserID
)
Insert Into @UserRanges Select * From OrderedRanges


;With AccumulateMinutes (RN,Accum, CurStart, CurEnd) as (
  Select
    RN, 0, AvailStart, AvailEnd
  From
    @UserRanges
  Where 
    RN = 1
  Union All
  Select
    o.RN, 
    a.Accum + Case When o.AvailStart <= a.CurEnd Then
        0
      Else 
        DateDiff(Minute, a.CurStart, a.CurEnd)
      End,
    Case When o.AvailStart <= a.CurEnd Then 
        a.CurStart
      Else
        o.AvailStart
      End,
    Case When o.AvailStart <= a.CurEnd Then
        Case When a.CurEnd > o.AvailEnd Then a.CurEnd Else o.AvailEnd End
      Else
        o.AvailEnd
      End
  From
    AccumulateMinutes a
        Inner Join 
    @UserRanges o On 
        a.RN + 1 = o.RN
)

Select 
  @Ret = Max(Accum + datediff(Minute, CurStart, CurEnd)) 
From 
  AccumulateMinutes 
Option
  (MaxRecursion 0)

Return @Ret

End

http://sqlfiddle.com/#!6/bfd94

#2

Here's another way of doing it with a cursor. I feel this techinique should be adaptable to a CTE, but I can't figure out how to do it

这是使用游标执行此操作的另一种方法。我觉得这个技术应该适应CTE,但我无法弄清楚如何做到这一点

The method is to arrange each range by start time Then we build a range that coalesces ranges in order, until we find a range that doesn't overlap our coalesced range. We then calculate how many minutes are in the coalesced range, and remember this We carry on with the next ranges, again coalesing any that overlap. We accumulate minutes each time we get a non overlapping start point At the end we add the accumulated minutes onto the length of the last range

方法是按开始时间排列每个范围然后我们构建一个按顺序合并范围的范围,直到我们找到一个不与我们的合并范围重叠的范围。然后我们计算合并范围内的分钟数,并记住这个我们继续下一个范围,再次合并任何重叠。我们每次得到一个非重叠的起始点时累积分钟。最后,我们将累积的分钟数加到最后一个范围的长度上

It's fairly easy to see that because of the order, once a range is distinct from what's gone before then no further ranges could overlap what's gone before, as their start dates are all greater.

很容易看出,由于顺序,一旦范围与之前的范围不同,那么没有其他范围可能与以前的范围重叠,因为它们的开始日期都更大。

Declare
  @UserID int = 456,
  @CurStart datetime, -- our current coalesced range start
  @CurEnd datetime, -- our current coalsced range end
  @AvailStart datetime, -- start or range for our next row of data
  @AvailEnd datetime, -- end of range for our next row of data
  @AccumMinutes int = 0 -- how many minutes so far accumulated by distinct ranges

Declare MinCursor Cursor Fast_Forward For
Select
  AvailStart, AvailEnd
From
  dbo.Available
Where
  UserID = @UserID
Order By
  AvailStart

Open MinCursor

Fetch Next From MinCursor Into @AvailStart, @AvailEnd
Set @CurStart = @AvailStart
Set @CurEnd = @AvailEnd

While @@Fetch_Status = 0
Begin
  If @AvailStart <= @CurEnd -- Ranges Overlap, so coalesce and continue
    Begin
    If @AvailEnd > @CurEnd 
      Set @CurEnd = @AvailEnd
    End
  Else -- Distinct range, coalesce minutes from previous range
  Begin
    Set @AccumMinutes = @AccumMinutes + DateDiff(Minute, @CurStart, @CurEnd)
    Set @CurStart = @AvailStart -- Start coalescing a new range
    Set @CurEnd = @AvailEnd
  End
  Fetch Next From MinCursor Into @AvailStart, @AvailEnd
End

Select @AccumMinutes + DateDiff(Minute, @CurStart, @CurEnd) As TotalMinutes

Close MinCursor
Deallocate MinCursor;

http://sqlfiddle.com/#!6/3483c/15

#3

The main problem is that you can have chains of overlapping entries, so you need to combine an indefinite amount of times to remove all the overlap - this is more suited to a procedural method than SQL. But if you would prefer to not use temporary tables, here's a CTE method - keep in mind that CTEs can only recurse a given number of times, so if you have any particularly long chains, it will fail.

主要问题是您可以拥有重叠条目链,因此您需要无限次地组合以删除所有重叠 - 这比SQL更适合于过程方法。但是如果你不想使用临时表,这里是一个CTE方法 - 请记住,CTE只能递归给定次数,所以如果你有任何特别长的链,它就会失败。

WITH MergedAvailable
AS
(
  SELECT Available.UserID, Available.AvailStart, MAX(Available.AvailEnd) AS AvailEnd
    FROM Available
   WHERE (
           SELECT COUNT(*)
             FROM Available AS InnerAvailable
            WHERE InnerAvailable.AvailStart < Available.AvailStart
                  AND
                  InnerAvailable.AvailEnd >= Available.AvailStart
         ) = 0
   GROUP BY Available.UserID, Available.AvailStart
  UNION ALL
  SELECT MergedAvailable.UserID, MergedAvailable.AvailStart,
         LongestExtensionToAvailableInterval.NewIntervalEnd
    FROM MergedAvailable
   CROSS APPLY GetLongestExtensionToAvailableInterval(MergedAvailable.UserID,
               MergedAvailable.AvailStart,
               MergedAvailable.AvailEnd) AS LongestExtensionToAvailableInterval
   WHERE LongestExtensionToAvailableInterval.NewIntervalEnd IS NOT NULL
)

SELECT SUM(DATEDIFF(MINUTE,
                    FinalAvailable.AvailStart,
                    FinalAvailable.AvailEnd)) AS MinsAvailable
  FROM (
         SELECT MergedAvailable.UserID, MergedAvailable.AvailStart,
                MAX(MergedAvailable.AvailEnd) AS AvailEnd
           FROM MergedAvailable
          GROUP BY MergedAvailable.UserID, MergedAvailable.AvailStart
       ) AS FinalAvailable

This table function is required:

该表函数是必需的:

CREATE FUNCTION GetLongestExtensionToAvailableInterval
(
  @UserID int,
  @CurrentIntervalStart datetime,
  @CurrentIntervalEnd datetime
)
RETURNS TABLE
AS
RETURN 
  SELECT MAX(Available.AvailEnd) AS NewIntervalEnd
    FROM Available
   WHERE Available.UserID = @UserID
         AND
         Available.AvailStart > @CurrentIntervalStart
         AND
         Available.AvailStart <= @CurrentIntervalEnd
         AND
         Available.AvailEnd > @CurrentIntervalEnd

The general idea is that it starts from all ranges where the start of the range isn't overlapping anything, and then with every recursion it extends the current range to the furthest extent of the currently overlapping ranges. The table function is needed to determine the furthest extent, as recursing sections of CTEs are not allowed to included plain aggregates.

一般的想法是,它从范围的起点不重叠的所有范围开始,然后在每次递归时,它将当前范围扩展到当前重叠范围的最远范围。需要表函数来确定最远的范围,因为不允许CTE的递归部分包括普通聚合。

With the data you've provided, the starting rows are:

使用您提供的数据,起始行是:

456 2012-11-19 16:00 2012-11-19 17:10
456 2012-11-19 17:30 2012-11-19 18:10

The only row which ends up being added via the recursion is:

最后通过递归添加的唯一行是:

456 2012-11-19 17:30 2012-11-19 18:30

For the sake of the example, say you had a row with ID 7 which went from 18:20 to 19:20. Then there would be a second recursion which brought back the row:

为了这个例子,假设你有一行ID为7,从18:20到19:20。然后会有第二次递归带回行:

456 2012-11-19 17:30 2012-11-19 19:20

So while the query will get to the start and end of each overlapping range, it will also be bringing back all the intermediate stages. This is why we need to take the aggregate maximum end date for each start date after the CTE, to remove them.

因此,当查询将到达每个重叠范围的开始和结束时,它也将带回所有中间阶段。这就是为什么我们需要在CTE之后的每个开始日期采用聚合最大结束日期来删除它们。

#4

Condition t1.availStart > t2.availEnd OR t1.availEnd < t2.availStart check period which will never be crossed. If it is crossed then minimum availStart or maximum availEnd esle availStart or availEnd.

条件t1.availStart> t2.availEnd OR t1.availEnd 检查期间永远不会越过。如果超过则最小availstart或最大availend>

Probably more than one crossing period.
In your case it 
16:00:00 - 17:10:00 includes the ranges:16:00:00 - 16:50:00,
                                        16:00:00 - 16:50:00,
                                        16:00:00 - 17:00:00,
                                        16:00:00 - 17:10:00
17:30:00 - 18:30:00 includes the ranges:17:30:00 - 18:10:00,
                                        18:00:00 - 18:30:00

UPDATE 21.11.2012; 30.11.2012; 04.01.2013

更新时间21.11.2012; 2012年11月30日; 2013年4月1日

CREATE FUNCTION dbo.Overlap
 (
  @availStart datetime,
  @availEnd datetime,
  @availStart2 datetime,
  @availEnd2 datetime
  )
RETURNS TABLE
RETURN
  SELECT CASE WHEN @availStart >= @availEnd2 OR @availEnd <= @availStart2
              THEN @availStart ELSE
                               CASE WHEN @availStart > @availStart2 THEN @availStart2 ELSE @availStart END
                               END AS availStart,
         CASE WHEN @availStart >= @availEnd2 OR @availEnd <= @availStart2
              THEN @availEnd ELSE
                             CASE WHEN @availEnd > @availEnd2 THEN @availEnd ELSE @availEnd2 END
                             END AS availEnd

;WITH cte AS
 (
  SELECT userID, availStart, availEnd, ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY AvailStart) AS Id
  FROM dbo.test53
  ), cte2 AS
 (
  SELECT Id, availStart, availEnd
  FROM cte
  WHERE Id = 1
  UNION ALL
  SELECT c.Id, o.availStart, o.availEnd
  FROM cte c JOIN cte2 ct ON c.Id = ct.Id + 1
             CROSS APPLY dbo.Overlap(c.availStart, c.availEnd, ct.availStart, ct.availEnd) AS o
  )
  SELECT TOP 1 SUM(DATEDIFF(minute, availStart, MAX(availEnd))) OVER()
  FROM cte2
  GROUP BY availStart

Demo on SQLFiddle

在SQLFiddle上演示

#5

Create Table #Available (
  ID int not null primary key,
  UserID int not null,
  AvailStart datetime not null,
  AvailEnd datetime not null
)


Insert Into #Available (ID,UserID, AvailStart, AvailEnd) Values
  (1,456, '2012-11-19 16:00', '2012-11-19 17:00'),
  (2,456, '2012-11-19 16:00', '2012-11-19 16:50'),
  (3,456, '2012-11-19 18:00', '2012-11-19 18:30'),
  (4,456, '2012-11-19 17:30', '2012-11-19 18:10'),
  (5,456, '2012-11-19 16:00', '2012-11-19 17:10'),
  (6,456, '2012-11-19 16:00', '2012-11-19 16:50'),
  (7,457, '2012-11-19 16:00', '2012-11-19 17:10'),
  (8,457, '2012-11-19 16:00', '2012-11-19 16:50');  
Select Distinct UserID 
into #users
from #Available


Create Table #mins(UserID int,atime datetime,aset tinyint )
Declare @start Datetime
Declare @end Datetime

Select @start=min(AvailStart),@end=max(AvailEnd) from #Available 
While @start<@end
    begin
     insert into #mins(UserID,atime) 
     Select UserID ,@Start from #users
     Select @start=DateAdd(mi,1,@start)
    end

update #mins set aset=1
from #Available
where atime>=AvailStart and atime<Availend and #mins.UserID = #Available.UserID


select UserID,SUM(aset) as [Minutes] 
from #mins
Group by UserID 
Drop table #Available
Drop table #mins
Drop table #users

#1