具有唯一ID的高级平均日期差异

时间:2021-10-31 21:54:42

im back to stack overflow with another headache that I have been trying to get to the bottom of with no success at all. No matter how many times I use avg(datediff) functions.

我回到堆栈溢出与另一个头痛,我一直试图到底,没有成功。无论我使用avg(datediff)函数多少次。

I have an SQL table like the below:

我有一个如下所示的SQL表:

ID | PersonID | Start | End

ID | PersonID |开始|结束

1 | 1 | 2006-03-21 00:00:00 | 2007-05-19 00:00:00 | Active
2 | 1 | 2007-05-19 00:00:00 | 2007-05-20 00:00:00 | Active
3 | 2 | 2016-08-24 00:00:00 | 2016-08-25 00:00:00 | Active
4 | 2 | 2005-08-25 00:00:00 | 2016-08-28 00:00:00 | Active
5 | 2 | 2016-08-28 00:00:00 | 2017-10-05 00:00:00 | Active

1 | 1 | 2006-03-21 00:00:00 | 2007-05-19 00:00:00 |活跃2 | 1 | 2007-05-19 00:00:00 | 2007-05-20 00:00:00 |活跃3 | 2 | 2016-08-24 00:00:00 | 2016-08-25 00:00:00 |活跃4 | 2 | 2005-08-25 00:00:00 | 2016-08-28 00:00:00 |活跃5 | 2 | 2016-08-28 00:00:00 | 2017-10-05 00:00:00 |活性

Im trying to find the average active stay (in days) across all unique people.

我试图找到所有独特的人的平均活跃住宿(以天为单位)。

Ie the average number of days based on their EARLIEST start date and LATEST end date (as a single person ID can have multiple active statuses).

即基于其EARLIEST开始日期和LATEST结束日期的平均天数(因为单个人ID可以具有多个活动状态)。

For example, person ID 1, their earliest start date was 2006-03-21 and their latest end date is 2007-05-20. Their stay has therefore been 425 days.

例如,人员ID 1,他们最早的开始日期是2006-03-21,他们的最新结束日期是2007-05-20。因此,他们的逗留时间为425天。

Repeat this for ID number 2, their stay is 407 days.

对ID号2重复此操作,他们的逗留时间为407天。

After doing this for everyone on the table... I want to get the average length of stay, the average for the above 5 rows, with 2 unique people is 416. Doing a simple datediff average across all rows will give me a very inaccurate average of 102.

在为桌面上的每个人做这个之后...我想得到平均逗留时间,上面5行的平均值,2个独特的人是416.在所有行中做一个简单的约会平均值会给我一个非常不准确的平均102。

Hope this makes sense. As always,any help you could give is very much appreciated.

希望这是有道理的。一如既往,非常感谢您提供的任何帮助。

1 个解决方案

#1


0  

So why not try that:

那么为什么不试试呢:

SELECT
  AVG(DATEDIFF(PersonEnd, PersonStart))
FROM
  (SELECT
     MIN(Start) AS PersonStart,
     MAX(End) AS PersonEnd
   FROM
     table
   GROUP BY
     PersonID) PeriodsPerPerson

Of course, you should have proper indexes so that MySQL can compute MAX and MIN fast and can group fast as well, which means indexes at least on PersonID, Start and End.

当然,您应该有适当的索引,以便MySQL可以快速计算MAX和MIN,并且也可以快速分组,这意味着索引至少在PersonID,Start和End上。

Please note that you really need the alias for the inner query although I don't use it anywhere. If you leave it away, you'll run into an error, at least with MySQL 5.5 (I don't know about later versions).

请注意,您确实需要内部查询的别名,尽管我不在任何地方使用它。如果你离开它,你将遇到一个错误,至少在MySQL 5.5(我不知道更高版本)。

If you have millions or even billions of rows, you might be better off moving the calculation into a stored procedure or a back-end application instead of doing it as shown above.

如果您有数百万甚至数十亿行,那么最好将计算移动到存储过程或后端应用程序中,而不是如上所示。

#1


0  

So why not try that:

那么为什么不试试呢:

SELECT
  AVG(DATEDIFF(PersonEnd, PersonStart))
FROM
  (SELECT
     MIN(Start) AS PersonStart,
     MAX(End) AS PersonEnd
   FROM
     table
   GROUP BY
     PersonID) PeriodsPerPerson

Of course, you should have proper indexes so that MySQL can compute MAX and MIN fast and can group fast as well, which means indexes at least on PersonID, Start and End.

当然,您应该有适当的索引,以便MySQL可以快速计算MAX和MIN,并且也可以快速分组,这意味着索引至少在PersonID,Start和End上。

Please note that you really need the alias for the inner query although I don't use it anywhere. If you leave it away, you'll run into an error, at least with MySQL 5.5 (I don't know about later versions).

请注意,您确实需要内部查询的别名,尽管我不在任何地方使用它。如果你离开它,你将遇到一个错误,至少在MySQL 5.5(我不知道更高版本)。

If you have millions or even billions of rows, you might be better off moving the calculation into a stored procedure or a back-end application instead of doing it as shown above.

如果您有数百万甚至数十亿行,那么最好将计算移动到存储过程或后端应用程序中,而不是如上所示。