在SQL查询中组合拆分日期范围

I'm working on a query that needs to have some data rows combined based on date ranges. These rows are duplicated in all the data values, except the date ranges are split. For example the table data may look like

我正在处理一个需要根据日期范围组合一些数据行的查询。除日期范围被拆分外,这些行在所有数据值中都是重复的。例如,表格数据可能看起来像

StudentID   StartDate   EndDate     Field1  Field2
1           9/3/2007    10/20/2007  3       True
1           10/21/2007  6/12/2008   3       True
2           10/10/2007  3/20/2008   4       False
3           9/3/2007    11/3/2007   8       True
3           12/15/2007  6/12/2008   8       True

The result of the query should have the split date ranges combined. The query should combine date ranges with a gap of only one day. If there is more than a one day gap, then the rows shouldn't be combined. The rows that don't have a split date range should come through unchanged. The result would look like

查询的结果应该具有组合的拆分日期范围。查询应该将日期范围与仅一天的差距组合在一起。如果间隔超过一天,则不应合并行。没有拆分日期范围的行应该保持不变。结果看起来像

StudentID   StartDate   EndDate     Field1  Field2
1           9/3/2007    6/12/2008   3       True
2           10/10/2007  3/20/2008   4       False
3           9/3/2007    11/3/2007   8       True
3           12/15/2007  6/12/2008   8       True

What would be the SELECT statement for this query?

这个查询的SELECT语句是什么?

10 个解决方案

#1

The following code should work. I've made a few assumptions as follows: there are no overlaps of date ranges, there are no NULL values in any of the fields, and the start date for a given row is always less than the end date. If your data doesn't fit these criteria, you'll need to adjust this method, but it should point you in the right direction.

以下代码应该有效。我做了一些假设如下:日期范围没有重叠,任何字段都没有NULL值,给定行的开始日期总是小于结束日期。如果您的数据不符合这些条件,则需要调整此方法,但它应指向正确的方向。

You can use subqueries instead of the views, but that can be cumbersome so I used the views to make the code clearer.

您可以使用子查询而不是视图,但这可能很麻烦,因此我使用视图使代码更清晰。

CREATE VIEW dbo.StudentStartDates
AS
    SELECT
        S.StudentID,
        S.StartDate,
        S.Field1,
        S.Field2
    FROM
        dbo.Students S
    LEFT OUTER JOIN dbo.Students PREV ON
        PREV.StudentID = S.StudentID AND
        PREV.Field1 = S.Field1 AND
        PREV.Field2 = S.Field2 AND
        PREV.EndDate = DATEADD(dy, -1, S.StartDate)
    WHERE PREV.StudentID IS NULL
GO

CREATE VIEW dbo.StudentEndDates
AS
    SELECT
        S.StudentID,
        S.EndDate,
        S.Field1,
        S.Field2
    FROM
        dbo.Students S
    LEFT OUTER JOIN dbo.Students NEXT ON
        NEXT.StudentID = S.StudentID AND
        NEXT.Field1 = S.Field1 AND
        NEXT.Field2 = S.Field2 AND
        NEXT.StartDate = DATEADD(dy, 1, S.EndDate)
    WHERE NEXT.StudentID IS NULL
GO


SELECT
    SD.StudentID,
    SD.StartDate,
    ED.EndDate,
    SD.Field1,
    SD.Field2
FROM
    dbo.StudentStartDates SD
INNER JOIN dbo.StudentEndDates ED ON
    ED.StudentID = SD.StudentID AND
    ED.Field1 = SD.Field1 AND
    ED.Field2 = SD.Field2 AND
    ED.EndDate > SD.StartDate AND
    NOT EXISTS (SELECT * FROM dbo.StudentEndDates ED2 WHERE ED2.StudentID = SD.StudentID AND ED2.Field1 = SD.Field1 AND ED2.Field2 = SD.Field2 AND ED2.EndDate < ED.EndDate AND ED2.EndDate > SD.StartDate)
GO

#2

select StudentID, min(StartDate) StartDate, max(EndDate) EndDate, Field1, Field2 
  from table
 group by StudentID, Field1, Field2

#3

In my experience, I have to combine the ranges in post-processing (not in SQL but in my script). I'm not sure that a SQL can do this, particularly because you can never know exactly how many date ranges need to be chained in any particular case. If this can be done though, I'd love to know too.

根据我的经验,我必须在后处理中组合范围(不是在SQL中,而是在我的脚本中)。我不确定SQL是否可以执行此操作,特别是因为您无法确切知道在任何特定情况下需要链接多少个日期范围。如果可以做到这一点,我也很想知道。

EDIT: My answer is assuming that you have more than one range of dates per student, not just a start and an end. If you only have the one date range with no gaps, then the other mentioned solutions are the way to go.

编辑:我的答案是假设每个学生有多个日期,而不仅仅是开始和结束。如果您只有一个没有间隙的日期范围,那么其他提到的解决方案是可行的方法。

#4

If the min()/max() solutions are not good enough (e.g. if the dates are not contiguous and you want to group separate date ranges separately), I wonder if something using Oracle's START WITH and CONNECT BY clauses would work. Which, of course, wouldn't work on every database.

如果min()/ max()解决方案不够好(例如,如果日期不是连续的并且您想要分别对不同的日期范围进行分组),我想知道使用Oracle的START WITH和CONNECT BY子句是否有效。当然,这对每个数据库都不起作用。

#5

EDIT: Make another set of SQL for Access. I tested all of this, but piece by piece because I don't know how to make several statements at one time in Access. Since I also don't know how to do comments, you can see the comments in the SQL version, below.

编辑:为Access创建另一组SQL。我测试了所有这些,但一块一块,因为我不知道如何在Access中一次发表几个语句。由于我也不知道如何做评论,你可以在下面的SQL版本中看到评论。

select 
studentid, min(startdate) as Starter, max(enddate) as Ender, field1, field2, 
max(startDate) - Min(endDate)  as MaxGap 
into tempIDs
from student 
group by studentid, field1, field2 ;  

delete from tempIDs where MaxGap > 1;

UPDATE student INNER JOIN TempIDs ON Student.studentID = TempIDS.StudentID
SET Student.StartDate = [TempIDs].[Starter],
 Student.EndDate = [TempIDs].[Ender];

I think this is it, in SQL Server - I didn't do it in Access. I haven't tested it for fancy conditions such as overlapping several records, etc., but this should get you started. It updates all the duplicate, small-gap records, leaving extras in the database. MSDN has a page on eliminating duplicates: http://support.microsoft.com/kb/139444

我认为就是这样,在SQL Server中 - 我没有在Access中这样做。我没有测试它的重要性,如重叠几个记录等,但这应该让你开始。它会更新所有重复的小间隙记录,并在数据库中留下额外内容。 MSDN有一个关于消除重复的页面:http://support.microsoft.com/kb/139444

select 
studentid, min(startdate) as StartDate, max(enddate) as EndDate, field1, field2, 
datediff(dd, Min(endDate),max(startDate)) as MaxGap 
into #tempIDs
from #student 
group by studentid, field1, field2    

-- Update the relevant records.  Keeps two copies of the massaged record 
-- - extra will need to be deleted.

update #student 
set startdate = #TempIDS.startdate, enddate = #tempIDS.EndDate
from #tempIDS 
where #student.studentid = #TempIDs.StudentID and MaxGap < 2

#6

An alternate final query to the one provided by Tom H. in the accepted answer is

对Tom H.在接受的答案中提供的最终查询是

SELECT
    SD.StudentID,
    SD.StartDate,
    MIN(ED.EndDate),
    SD.Field1,
    SD.Field2
FROM
    dbo.StudentStartDates SD
INNER JOIN dbo.StudentEndDates ED ON
    ED.StudentID = SD.StudentID AND
    ED.Field1 = SD.Field1 AND
    ED.Field2 = SD.Field2 AND
    ED.EndDate > SD.StartDate
GROUP BY
    SD.StudentID, SD.Field1, SD.Field2, SD.StartDate

This also worked on all test data.

这也适用于所有测试数据。

#7

This is a classic problem in SQL (the language) e.g. covered in Joe Celko's books 'SQL for Smarties" (chapter 23, Regions, Runs, Gaps, Sequences and Series) and his latest book "Thinking in Sets" (chapter 15).

这是SQL(语言)中的典型问题,例如Joe Celko的书籍“适合聪明的人的SQL”(第23章,地区,运行,差距,序列和系列)以及他的最新着作“集思考”(第15章)。

While it's 'fun' to fix the data at run time with a monster query, for me this is one of those situations that can be better fixed off line and procedurally (personally I'd do it with formulas in an Excel spreadsheet).

虽然使用怪物查询在运行时修复数据很有趣,但对我而言,这是可以更好地离线和程序性修复的情况之一(我个人会在Excel电子表格中使用公式来实现)。

The important thing is to put in place effective database constraints to prevent the overlapping periods reoccurring. Again, writing sequenced constraints in SQL is a classic: see Snodgrass (http://www.cs.arizona.edu/people/rts/tdbbook.pdf). Hint for MS Access users: you'll need to use CHECK constraints.

重要的是要建立有效的数据库约束,以防止重复的重复周期。再次,在SQL中编写顺序约束是一种经典:参见Snodgrass(http://www.cs.arizona.edu/people/rts/tdbbook.pdf)。提示MS Access用户:您需要使用CHECK约束。

#8

Heres an example with test data using SQL Server 2005/2008 syntax.

下面是使用SQL Server 2005/2008语法测试数据的示例。

DECLARE @Data TABLE(
    CalendarDate datetime )

INSERT INTO @Data( CalendarDate )
-- range start
SELECT '1 Jan 2010'
UNION ALL SELECT '2 Jan 2010'
UNION ALL SELECT '3 Jan 2010'
-- range start
UNION ALL SELECT '5 Jan 2010'
-- range start
UNION ALL SELECT '7 Jan 2010'
UNION ALL SELECT '8 Jan 2010'
UNION ALL SELECT '9 Jan 2010'
UNION ALL SELECT '10 Jan 2010'

SELECT DateGroup, Min( CalendarDate ) AS StartDate, Max( CalendarDate ) AS EndDate
FROM(   SELECT NextDay.CalendarDate, 
            DateDiff( d, RangeStart.CalendarDate, NextDay.CalendarDate ) - ROW_NUMBER() OVER( ORDER BY NextDay.CalendarDate ) AS DateGroup
        FROM( SELECT Min( CalendarDate ) AS CalendarDate
                FROM @data ) AS RangeStart
            JOIN @data AS NextDay
                ON NextDay.CalendarDate >= RangeStart.CalendarDate ) A
GROUP BY DateGroup

#9

Have you considered a non-equi join? That would look something like this:

你考虑过非equi加入吗?这看起来像这样:

SELECT A.StudentID, A.StartDate, A.EndDate, A.Field1, A.Field2
FROM tblEnrollment AS A LEFT JOIN tblEnrollment AS B ON (A.StudentID = B.StudentID) 
   AND (A.EndDate=B.StartDate-1)
WHERE B.StudentID Is Null;

What that gives you is all the records that don't have a corresponing record that starts the day after the ending date of the first record.

给你的是所有没有相应记录的记录,这些记录从第一个记录的结束日期开始。

[Caveat: Beware that you can only edit a non-equi join in the Access query designer in SQL View -- switching to Design View could cause the join to be lost (though if you do switch Access tells you about the problem, and if you immediately switch back to SQL View, you won't lose it)]

[警告:请注意,您只能在SQL视图中编辑Access查询设计器中的非equi连接 - 切换到设计视图可能会导致连接丢失(尽管如果您切换Access会告诉您有关此问题的信息,以及是否你立即切换回SQL View,你不会失去它)]

If you then UNION that with this:

如果你然后用这个UNION:

SELECT A.StudentID, A.StartDate, B.EndDate, A.Field1, A.Field2
FROM tblEnrollment AS A INNER JOIN tblEnrollment AS B ON (A.StudentID = B.StudentID) 
   AND (A.EndDate= B.StartDate-1)

It should give you what you need, assuming there are never more than two contiguous records at a time. I'm not sure how you'd do it if you had more than two contiguous records (it might involve looking at StartDate-1 compared to EndDate), but this might get you started in the right direction.

假设一次只有两个以上的连续记录,它应该给你所需要的东西。如果您有两个以上的连续记录(可能需要查看StartDate-1与EndDate相比),我不确定您是如何做到的,但这可能会让您开始朝着正确的方向前进。

#10

SELECT StudentID, MIN(startdate) AS startdate, MAX(enddate), field1, field2
FROM tablex
GROUP BY StudentID, field1, field2

That would yield you the result assuming the wasn't a gap between on student's time range.

如果假设学生的时间范围之间没有差距,那将会产生结果。

#1