SQL汇总时间范围的最小/最大活动时间

This is an SQL problem I can't wrap my head around in a simple query Is it possible?

这是一个SQL问题我无法在一个简单的查询中包围我的头脑是否可能?

The data set is (letters added for ease of understanding):

数据集是(为便于理解而添加的字母):

Start      End
10:01      10:12     (A)
10:03      10:06     (B)
10:05      10:25     (C)
10:14      10:42     (D)
10:32      10:36     (E)

The desired output is:

所需的输出是:

PeriodStart   New       ActiveAtEnd    MinActive   MaxActive
09:50         0         0              0           0
10:00         3 (ABC)   2 (AC)         0           3 (ABC)
10:10         1 (D)     2 (CD)         1 (C)       2 (AC or CD)
10:20         0         1 (D)          1 (C)       2 (CD)
10:30         1 (E)     1 (D)          1 (D)       2 (DE)
10:40         0         0              0           1 (D)
10:50         0         0              0           0

So, the query needed is a summary of the first table, calculating the minimum overlapping time periods (Start-End) and the maximum overlapping time periods (Start-End) from the first table within a 10 minute period.

因此,所需的查询是第一个表的摘要,计算10分钟内第一个表中的最小重叠时间段(开始 - 结束)和最大重叠时间段(开始 - 结束)。

'New' is the number of rows with a Start in the summary period. 'ActiveAtEnd' is the number of rows active at the end of the summary period.

“新建”是在摘要期间具有“开始”的行数。 'ActiveAtEnd'是摘要周期结束时活动的行数。

I'm using Oracle, but I'm sure a solution can be adjusted. Stored procedures not allowed - just plain SELECT/INSERT (views are allowed). Its also OK to run one SQL command per 10 minute output (as once populated, that will be how it keeps up to date.

我正在使用Oracle,但我确信可以调整解决方案。不允许存储过程 - 只需简单的SELECT / INSERT(允许视图)。每10分钟输出一次运行一个SQL命令也是可以的(一旦填充,这将是它保持最新的方式。

Thanks for any ideas, including 'not possible' ;-)

感谢任何想法,包括'不可能';-)

4 个解决方案

#1

Assuming you also have (or Create) a table named @Times with one record for each ten minute start time, How about...

假设您还拥有(或创建)一个名为@Times的表,每个十分钟的开始时间有一条记录,那么......

   Select T.Start,  
      (Select Count(*) From testTab
       Where Start Between T.Start 
          And DateAdd(minute, 10, T.Start)) New,
      (Select Count(*) From testTab
       Where Start < DateAdd(minute, 10, T.Start)
          And EndDt > DateAdd(minute, 10, T.Start)) ActiveAtEnd,
      (Select Max(Cnt) From 
          (Select Count(Distinct T.Which) Cnt
           From  (Select Distinct Start
                    From testTab
                    Where Start Between T.Start 
                           And DateAdd(minute, 10, T.Start)
                    Union Select T.Start 
                    Union Select DateAdd(minute, 10, T.Start)) Z
             Left Join testTab T 
               On Z.Start Between T.Start And T.EndDt
           Group By Z.Start) ZZ ) MaxActive,
      (Select Min(Cnt) From 
           (Select Count(Distinct T.Which) Cnt
            From  (Select Distinct Start
                   From testTab
                   Where Start Between T.Start 
                           And DateAdd(minute, 10, T.Start)
                   Union Select T.Start                                                         
                   Union Select DateAdd(minute, 10, T.Start)) Z
              Left Join testTab T 
                 On Z.Start Between T.Start And T.EndDt
           Group By Z.Start) ZZ ) MinActive     
   From @Times T

I Created this table in SQL Server as a Table variable, using

我在SQL Server中创建此表作为Table变量,使用

Declare @Times Table (Start datetime Primary key Not Null)
Declare @Start DateTime 
Set @Start = '1 Nov 2008 10:00'
While @Start < '1 Nov 2008 11:00' begin
    Insert @Times(Start) values(@Start)
    Set @Start = DateAdd(minute, 10, @Start) 
End

If you are using another product, use a temp table instead... but this approach does need a table with one record for each ten minute "period" as a hook to run against...

如果您正在使用其他产品,请使用临时表...但是这种方法确实需要一个表,每十分钟“句点”有一条记录作为运行的钩子...

with the following data, this query generates output as follows:

使用以下数据,此查询生成输出,如下所示:

start                   endDt                   Which
----------------------- ----------------------- -----
2008-11-01 10:01:00.000 2008-11-01 10:12:00.000 A
2008-11-01 10:03:00.000 2008-11-01 10:06:00.000 B
2008-11-01 10:05:00.000 2008-11-01 10:25:00.000 C
2008-11-01 10:14:00.000 2008-11-01 10:42:00.000 D
2008-11-01 10:32:00.000 2008-11-01 10:36:00.000 E
2008-11-01 10:22:00.000 2008-11-01 10:51:00.000 F
2008-11-01 10:22:00.000 2008-11-01 10:23:00.000 G

Start                   New         ActiveAtEnd MaxActive   MinActive
----------------------- ----------- ----------- ----------- -----------
2008-11-01 10:00:00.000 3           2           3           0
2008-11-01 10:10:00.000 1           2           2           2
2008-11-01 10:20:00.000 2           2           4           2
2008-11-01 10:30:00.000 1           2           3           2
2008-11-01 10:40:00.000 0           1           2           1
2008-11-01 10:50:00.000 0           0           1           0

Warning: Null value is eliminated by an aggregate or other SET operation.

警告:聚合或其他SET操作消除了空值。

#2

I'm struggling with the ActiveAtEnd value, but the others are OK.

我正在努力使用ActiveAtEnd值,但其他人都没问题。

This is for MySQL:

这适用于MySQL:

set @active:=0;

select 
  period, 
  sum( if( score=1, 1, 0)) New, 
  if( max(ab) > max(aa), max(ab), max(aa)) MaxActive, 
  if( min( ab ) < min( aa ), min(ab), min(aa)) MinActive  
from  (
        select 
          period, 
          etime, 
          score, 
          @active ab, 
          @active:=@active+score aa 
        from (
               select 
                 from_unixtime( floor( unix_timestamp(start)/600) * 600) period, 
                 start etime, 
                 +1 score 
               from ev  
               union all
               select from_unixtime( floor( unix_timestamp(end)/600) * 600) period, 
                 end etime, 
                 -1 score
               from ev 
             ) event order by etime
      ) as temp 
group by period;

The innermost selection breaks the original table into a set of events - with a score of +1 for a start-event, and -1 for an end event. union all is used so that duplicate events are allowed.

最里面的选择将原始表分成一组事件 - 开始事件的得分为+1,结束事件的得分为-1。使用union all以便允许重复事件。

The next inner selection runs a variable across the score values - @active holds a count of the number of active intervals at each time point. Both the value of @active before and after the current count is added is selected: I do not know how portable this is.

下一个内部选择在分数值上运行变量 - @active保存每个时间点的活动间隔数。选择添加当前计数之前和之后的@active值:我不知道这是多么便携。

The outermost selection accumulates the results for each period. 'New' is the sum of '+1' scores, MaxActive and MinActive must both take the value of active before (ab) and active after (aa) into account.

最外面的选择累积每个时期的结果。 “新”是“+1”分数的总和,MaxActive和MinActive必须同时考虑(a)之前的活动值和(aa)之后的活动值。

Here are sample results:

以下是示例结果:

+---------------------+------+-----------+-----------+
| period              | New  | MaxActive | MinActive |
+---------------------+------+-----------+-----------+
| 2008-11-19 10:00:00 |    3 |         3 |         0 |
| 2008-11-19 10:10:00 |    1 |         2 |         1 |
| 2008-11-19 10:20:00 |    0 |         2 |         1 |
| 2008-11-19 10:30:00 |    1 |         2 |         1 |
| 2008-11-19 10:40:00 |    0 |         1 |         0 |
+---------------------+------+-----------+-----------+

#3

The New and ActiveAtEnd are fairly straightforward (assuming the the period's start and end being stored in temporary variables):

New和ActiveAtEnd相当简单(假设句点的开始和结束存储在临时变量中):

select  @periodStart PeriodStart
, @periodEnd PeriodEnd 
, n.[new]
, ae.ActiveAtEnd
from (
select  count(*) [new] 
from    @times 
where   [start] >= @periodStart
and [start] < @PeriodEnd 
) n 
cross join 
(
select  count(*) [ActiveAtEnd] 
from    @times
where   [start] < @PeriodEnd 
and [end] >= @PeriodEnd 
) ae

The Max and Min Actives are harder. You can presume a minute's granularity, so you would need to explode out active period at that granularity to be able to probe into each slice.

Max和Min Actives更难。您可以设定一分钟的粒度,因此您需要以该粒度分解活动期以便能够探测每个切片。

I'm not sure that that's possible in a single query.

我不确定在单个查询中是否可行。

#4

The only way that I have ever been able to solve this sort of problem has been to get the count of 'start' for each one minute period. You then get the maximum (or minimum) for the 10 minute group. I have not been able to apply a set based approach.

我能够解决这类问题的唯一方法就是每分钟计算一次'开始'。然后,您将获得10分钟组的最大值(或最小值)。我无法应用基于集合的方法。

#1