如果值几乎相同，我如何按DateTime对行进行分组？

How can I group rows if the different time between on field and other field in other row almost the same?

如果字段与其他行中的其他字段之间的不同时间几乎相同,我如何对行进行分组?

For example:

GUID - No    - sDateTime            - eDateTime           - Name  
------------------------------------------------------------------    
0000 - 01    - 2013-02-02 08:00:00  - 2013-02-02 08:40:00 - A 
0000 - 02    - 2013-02-02 08:45:00  - 2013-02-02 09:45:00 - A 
0000 - 03    - 2013-02-02 11:30:00  - 2013-02-02 12:00:00 - A 
0000 - 04    - 2013-02-02 09:55:00  - 2013-02-02 11:00:00 - A   
0000 - 05    - 2013-02-02 11:05:00  - 2013-02-02 12:30:00 - B

How can I get sDateTime from row with eDateTime from other row if the different not more than 10 minutes?

如果不同的时间不超过10分钟,如何从其他行的eDateTime行获取sDateTime?

GUID - No      - SDateTime            - EDateTime (From other row) - Name  
----------------------------------------------------------------------------  
0000 -01,02,04 - 2013-02-02 08:00:00  - **2013-02-02 11:00:00**    - A 
0000 - 03      - 2013-02-02 11:30:00  - 2013-02-02 12:00:00        - A
0000 - 05      - 2013-02-02 11:05:00  - 2013-02-02 12:30:00        - B

2 个解决方案

#1

No cursors here but this is still a loop over rows using a recursive CTE to figure out what rows should be grouped together. The temp table #T is there to hold the ID that connects rows to each other. It could have been done without the temp table but for performance reasons it is better to do it with a temp table instead of having another CTE as the source for the recursive CTE.

这里没有游标,但这仍然是一个循环,使用递归CTE来确定哪些行应该组合在一起。临时表#T用于保存将行相互连接的ID。它可以在没有临时表的情况下完成,但出于性能原因,最好使用临时表而不是使用另一个CTE作为递归CTE的源。

In the final query I use the for xml path trick to concatenate the values from No. That means that the CTE C is used a second time so it will be executed twice. You might want to turn that CTE to a temp table as well to avoid that.

在最后的查询中,我使用for xml路径技巧来连接来自No的值。这意味着第二次使用CTE C,因此它将被执行两次。您可能希望将该CTE转换为临时表以避免这种情况。

SQL Fiddle

MS SQL Server 2008 Schema Setup:

MS SQL Server 2008架构设置:

create table YourTable
(
  GUID varchar(4),
  No varchar(2),
  sDateTime datetime,
  eDateTime datetime,
  Name varchar(1)
);

insert into YourTable values
('0000', '01', '2013-02-02 08:00:00', '2013-02-02 08:40:00', 'A'),
('0000', '02', '2013-02-02 08:45:00', '2013-02-02 09:45:00', 'A'),
('0000', '03', '2013-02-02 11:30:00', '2013-02-02 12:00:00', 'A'),
('0000', '04', '2013-02-02 09:55:00', '2013-02-02 11:00:00', 'A'),
('0000', '05', '2013-02-02 11:05:00', '2013-02-02 12:30:00', 'B');

Query 1:

create table #T
(
  ID int,
  GUID varchar(4),
  No varchar(2),
  sDateTime datetime,
  eDateTime datetime,
  Name varchar(1),
  primary key(ID, GUID, Name)
);

insert into #T(ID, GUID, No, sDateTime, eDateTime, Name)
select row_number() over(partition by GUID, Name order by sDateTime),
       GUID, No, sDateTime, eDateTime, Name
from YourTable;

with C as
(
  select T.ID, T.GUID, T.No, T.sDateTime, T.eDateTime, T.Name, 1 as Grp
  from #T as T
  where T.ID = 1
  union all
  select T.ID, T.GUID, T.No, T.sDateTime, T.eDateTime, T.Name, 
         C.Grp + case when datediff(minute, C.eDateTime, T.sDateTime) > 10 
                   then 1 
                   else 0 
                 end
  from #T as T
    inner join C 
      on T.ID = C.ID + 1 and
         T.Name = C.Name and
         T.GUID = C.GUID
)
select C.GUID,
       (
       select ','+C2.No
       from C as C2
       where C.GUID = C2.GUID and
             C.Name = C2.Name and
             C.Grp = C2.Grp
       order by C2.No
       for xml path(''), type
       ).value('substring(text()[1], 2)', 'varchar(max)') as No,
       min(C.sDateTime) as sDateTime,
       max(C.eDateTime) as eDateTime,
       C.Name
from C
group by C.GUID, C.Name, C.Grp

drop table #T;

Results:

| GUID |       NO |                       SDATETIME |                       EDATETIME | NAME |
----------------------------------------------------------------------------------------------
| 0000 | 01,02,04 | February, 02 2013 08:00:00+0000 | February, 02 2013 11:00:00+0000 |    A |
| 0000 |       03 | February, 02 2013 11:30:00+0000 | February, 02 2013 12:00:00+0000 |    A |
| 0000 |       05 | February, 02 2013 11:05:00+0000 | February, 02 2013 12:30:00+0000 |    B |

Just for fun, a SQL Server 2012 version.

只是为了好玩,一个SQL Server 2012版本。

with CMinDiff as
(
  select GUID, No, sDateTime, eDateTime, Name, 
         case when datediff(minute, 
                            coalesce(lag(eDateTime) over(partition by GUID, Name 
                                                         order by eDateTime), 
                                     sDateTime), 
                            sDateTime) <= 10 
           then 0 
           else 1 
         end as MinDiff
  from YourTable
), CSumMinDiff as
(
  select GUID, No, sDateTime, eDateTime, Name, 
         sum(MinDiff) over(partition by GUID, Name 
                           order by sDateTime 
                           rows between unbounded preceding and current row) as Grp
  from CMinDiff
)
select C.GUID,
       (
       select ','+C2.No
       from CSumMinDiff as C2
       where C.GUID = C2.GUID and
             C.Name = C2.Name and
             C.Grp = C2.Grp
       order by C2.No
       for xml path(''), type
       ).value('substring(text()[1], 2)', 'varchar(max)') as No,
       min(C.sDateTime) as sDateTime,
       max(C.eDateTime) as eDateTime,
       C.Name
from CSumMinDiff as C
group by C.GUID, C.Name, C.Grp

#2

I was unable to perform this without a cursor, maybe somebody smarter can come up with a solution that does not include cursors, but this is what I have:

没有游标我无法执行此操作,也许更聪明的人可以想出一个不包含游标的解决方案,但这就是我所拥有的:

CREATE TABLE MyTable
(
    [GUID] VARCHAR(4),
    [No] VARCHAR(2),
    [sDateTime] DATETIME,
    [eDateTime] DATETIME,
    [Name] VARCHAR(1)
)

INSERT INTO [MyTable]
([GUID], [No], [sDateTime], [eDateTime], [Name])
VALUES
('0000', '01', '2013-02-02 08:00:00', '2013-02-02 08:40:00', 'A'),
('0000', '02', '2013-02-02 08:45:00', '2013-02-02 09:45:00', 'A'),
('0000', '03', '2013-02-02 11:30:00', '2013-02-02 12:00:00', 'A'),
('0000', '04', '2013-02-02 09:55:00', '2013-02-02 11:00:00', 'A'),
('0000', '05', '2013-02-02 11:05:00', '2013-02-02 12:30:00', 'B')

DECLARE @MyTable TABLE 
(
    [GUID] VARCHAR(4),
    [No] VARCHAR(100),
    [sDateTime] DATETIME,
    [eDateTime] VARCHAR(100),
    [Name] VARCHAR(1)
)

DECLARE [MyCursor] CURSOR FOR 
SELECT [GUID], [No], [sDateTime], [eDateTime], [Name]
FROM [MyTable]
ORDER BY [Name], [sDateTime];

DECLARE @GUID AS VARCHAR(4)
DECLARE @No AS VARCHAR(2)
DECLARE @sDateTime AS DATETIME
DECLARE @eDateTime AS DATETIME
DECLARE @Name AS VARCHAR(1)
DECLARE @OldName AS VARCHAR(1)
SET @OldName = ''
DECLARE @MergedNo AS VARCHAR(100)
SET @MergedNo = NULL
DECLARE @sOldDateTime AS DATETIME
DECLARE @eOldDateTime AS DATETIME

OPEN [MyCursor]
FETCH NEXT FROM [MyCursor] 
INTO @GUID, @No, @sDateTime, @eDateTime, @Name

WHILE @@FETCH_STATUS = 0
BEGIN
    IF (@OldName = '')
    BEGIN
        SET @sOldDateTime = @sDateTime
        SET @OldName = @Name
    END
    ELSE IF (@OldName <> @Name OR DATEDIFF(MINUTE, @eOldDateTime, @sDateTime) > 10)
    BEGIN
        INSERT INTO @MyTable
        ([GUID], [No], [sDateTime], [eDateTime], [Name])
        VALUES
        (@GUID, @MergedNo, @sOldDateTime, CASE WHEN @MergedNo LIKE '%,%' THEN '**' + CONVERT(VARCHAR(100), @eOldDateTime, 20) + '**' ELSE CONVERT(VARCHAR(100), @eOldDateTime, 20) END, @Name)

        SET @sOldDateTime = @sDateTime 
        SET @OldName = @Name
        SET @MergedNo = NULL
    END

    SET @MergedNo = COALESCE(@MergedNo + ', ', '') + @No
    SET @eOldDateTime = @eDateTime

    FETCH NEXT FROM [MyCursor] 
    INTO @GUID, @No, @sDateTime, @eDateTime, @Name
END 
CLOSE [MyCursor];
DEALLOCATE [MyCursor];

IF (@OldName <> '')
BEGIN
    INSERT INTO @MyTable
    ([GUID], [No], [sDateTime], [eDateTime], [Name])
    VALUES
    (@GUID, @MergedNo, @sOldDateTime, CASE WHEN @MergedNo LIKE '%,%' THEN '**' + CONVERT(VARCHAR(100), @eOldDateTime, 20) + '**' ELSE CONVERT(VARCHAR(100), @eOldDateTime, 20) END, @Name)
END

SELECT *
FROM @MyTable

DROP TABLE [MyTable]

#1