在sql结果中(在mysql或perl端)填充空日期最直接的方法是什么?

时间:2022-12-26 01:28:20

I'm building a quick csv from a mysql table with a query like:

我正在用mysql表构建一个快速的csv,查询如下:

select DATE(date),count(date) from table group by DATE(date) order by date asc;

and just dumping them to a file in perl over a:

将它们转储到perl中的文件中

while(my($date,$sum) = $sth->fetchrow) {
    print CSV "$date,$sum\n"
}

There are date gaps in the data, though:

不过,数据中存在日期差异:

| 2008-08-05 |           4 | 
| 2008-08-07 |          23 | 

I would like to pad the data to fill in the missing days with zero-count entries to end up with:

我想填充数据,用零计数的条目填充缺失的日子,以:

| 2008-08-05 |           4 | 
| 2008-08-06 |           0 | 
| 2008-08-07 |          23 | 

I slapped together a really awkward (and almost certainly buggy) workaround with an array of days-per-month and some math, but there has to be something more straightforward either on the mysql or perl side.

我花了一个非常笨拙的(几乎肯定是有bug的)解决方案,每个月用一系列的时间和一些数学计算,但是在mysql或perl端必须有一些更直接的东西。

Any genius ideas/slaps in the face for why me am being so dumb?

有没有什么天才的点子,能让我的脸变得这么蠢?


I ended up going with a stored procedure which generated a temp table for the date range in question for a couple of reasons:

最后,我使用了一个存储过程,它生成了一个有关日期范围的临时表,原因如下:

  • I know the date range I'll be looking for every time
  • 我知道我每次都要找的日期范围
  • The server in question unfortunately was not one that I can install perl modules on atm, and the state of it was decrepit enough that it didn't have anything remotely Date::-y installed
  • 不幸的是,所涉及的服务器不是我可以在atm上安装perl模块的服务器,而且它的状态已经非常糟糕,以至于没有任何远程日期:-y安装

The perl Date/DateTime-iterating answers were also very good, I wish I could select multiple answers!

perl Date/ datetime迭代答案也很好,我希望我可以选择多个答案!

9 个解决方案

#1


20  

When you need something like that on server side, you usually create a table which contains all possible dates between two points in time, and then left join this table with query results. Something like this:

当您在服务器端需要类似的东西时,通常会创建一个包含两个时间点之间所有可能的日期的表,然后将查询结果留给join这个表。是这样的:

create procedure sp1(d1 date, d2 date)
  declare d datetime;

  create temporary table foo (d date not null);

  set d = d1
  while d <= d2 do
    insert into foo (d) values (d)
    set d = date_add(d, interval 1 day)
  end while

  select foo.d, count(date)
  from foo left join table on foo.d = table.date
  group by foo.d order by foo.d asc;

  drop temporary table foo;
end procedure

In this particular case it would be better to put a little check on the client side, if current date is not previos+1, put some addition strings.

在这种情况下,最好在客户端进行一点检查,如果当前日期不是previos+1,则添加一些附加字符串。

#2


7  

When I had to deal with this problem, to fill in missing dates I actually created a reference table that just contained all dates I'm interested in and joined the data table on the date field. It's crude, but it works.

当我不得不处理这个问题时,为了填写丢失的日期,我实际上创建了一个引用表,其中只包含我感兴趣的所有日期,并加入了date字段的数据表。它很粗糙,但很管用。

SELECT DATE(r.date),count(d.date) 
FROM dates AS r 
LEFT JOIN table AS d ON d.date = r.date 
GROUP BY DATE(r.date) 
ORDER BY r.date ASC;

As for output, I'd just use SELECT INTO OUTFILE instead of generating the CSV by hand. Leaves us free from worrying about escaping special characters as well.

至于输出,我将使用SELECT INTO OUTFILE而不是手工生成CSV。也让我们不用担心要逃避特殊角色。

#3


4  

not dumb, this isn't something that MySQL does, inserting the empty date values. I do this in perl with a two-step process. First, load all of the data from the query into a hash organised by date. Then, I create a Date::EzDate object and increment it by day, so...

不是哑的,这不是MySQL做的,插入空的日期值。我在perl中使用了两个步骤的过程。首先,将查询中的所有数据加载到按日期组织的散列中。然后,我创建一个日期:EzDate对象并逐日递增,因此…

my $current_date = Date::EzDate->new();
$current_date->{'default'} = '{YEAR}-{MONTH NUMBER BASE 1}-{DAY OF MONTH}';
while ($current_date <= $final_date)
{
    print "$current_date\t|\t%hash_o_data{$current_date}";  # EzDate provides for     automatic stringification in the format specfied in 'default'
    $current_date++;
}

where final date is another EzDate object or a string containing the end of your date range.

最后日期是另一个EzDate对象或包含您的日期范围结束的字符串。

EzDate isn't on CPAN right now, but you can probably find another perl mod that will do date compares and provide a date incrementor.

EzDate现在不在CPAN上,但是您可能会找到另一个perl mod,它会对日期进行比较,并提供一个日期增量。

#4


4  

You could use a DateTime object:

您可以使用DateTime对象:

use DateTime;
my $dt;

while ( my ($date, $sum) = $sth->fetchrow )  {
    if (defined $dt) {
        print CSV $dt->ymd . ",0\n" while $dt->add(days => 1)->ymd lt $date;
    }
    else {
        my ($y, $m, $d) = split /-/, $date;
        $dt = DateTime->new(year => $y, month => $m, day => $d);
    }
    print CSV, "$date,$sum\n";
}

What the above code does is it keeps the last printed date stored in a DateTime object $dt, and when the current date is more than one day in the future, it increments $dt by one day (and prints it a line to CSV) until it is the same as the current date.

上面的代码所做的是使最后打印日期存储在一个DateTime对象$ dt,当当前日期超过一天在不久的将来,它的增量dt美元一天(CSV)并打印一行,直到当前日期是一样的。

This way you don't need extra tables, and don't need to fetch all your rows in advance.

这样,您就不需要额外的表,也不需要提前获取所有的行。

#5


1  

Since you don't know where the gaps are, and yet you want all the values (presumably) from the first date in your list to the last one, do something like:

因为你不知道差距在哪里,但是你想要列表中从第一个日期到最后一个日期的所有值(大概),做如下的事情:

use DateTime;
use DateTime::Format::Strptime;
my @row = $sth->fetchrow;
my $countdate = strptime("%Y-%m-%d", $firstrow[0]);
my $thisdate = strptime("%Y-%m-%d", $firstrow[0]);

while ($countdate) {
  # keep looping countdate until it hits the next db row date
  if(DateTime->compare($countdate, $thisdate) == -1) {
    # counter not reached next date yet
    print CSV $countdate->ymd . ",0\n";
    $countdate = $countdate->add( days => 1 );
    $next;
  }

  # countdate is equal to next row's date, so print that instead
  print CSV $thisdate->ymd . ",$row[1]\n";

  # increase both
  @row = $sth->fetchrow;
  $thisdate = strptime("%Y-%m-%d", $firstrow[0]);
  $countdate = $countdate->add( days => 1 );
}

Hmm, that turned out to be more complicated than I thought it would be.. I hope it makes sense!

嗯,结果比我想象的要复杂得多。我希望这是有道理的!

#6


1  

I think the simplest general solution to the problem would be to create an Ordinal table with the highest number of rows that you need (in your case 31*3 = 93).

我认为最简单的解决方法是创建一个序数表,其中包含您需要的最多行数(在您的例子中是31*3 = 93)。

CREATE TABLE IF NOT EXISTS `Ordinal` (
  `n` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`n`)
);
INSERT INTO `Ordinal` (`n`)
VALUES (NULL), (NULL), (NULL); #etc

Next, do a LEFT JOIN from Ordinal onto your data. Here's a simple case, getting every day in the last week:

接下来,从序数到数据的左连接。这是一个简单的例子,在上一周的每一天:

SELECT CURDATE() - INTERVAL `n` DAY AS `day`
FROM `Ordinal` WHERE `n` <= 7
ORDER BY `n` ASC

The two things you would need to change about this are the starting point and the interval. I have used SET @var = 'value' syntax for clarity.

需要改变的两点是起始点和区间。为了清晰,我使用了SET @var = 'value'语法。

SET @end = CURDATE() - INTERVAL DAY(CURDATE()) DAY;
SET @begin = @end - INTERVAL 3 MONTH;
SET @period = DATEDIFF(@end, @begin);

SELECT @begin + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal` WHERE `n` < @period
ORDER BY `n` ASC;

So the final code would look something like this, if you were joining to get the number of messages per day over the last three months:

所以最终的代码会是这样的,如果你加入进来在过去的三个月里每天收到的信息数量:

SELECT COUNT(`msg`.`id`) AS `message_count`, `ord`.`date` FROM (
    SELECT ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH) + INTERVAL (`n` + 1) DAY AS `date`
    FROM `Ordinal`
    WHERE `n` < (DATEDIFF((CURDATE() - INTERVAL DAY(CURDATE()) DAY), ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH)))
    ORDER BY `n` ASC
) AS `ord`
LEFT JOIN `Message` AS `msg`
  ON `ord`.`date` = `msg`.`date`
GROUP BY `ord`.`date`

Tips and Comments:

提示和注释:

  • Probably the hardest part of your query was determining the number of days to use when limiting Ordinal. By comparison, transforming that integer sequence into dates was easy.
  • 可能您的查询中最困难的部分是确定在限制序号时使用的天数。相比之下,将这个整数序列转换成日期很容易。
  • You can use Ordinal for all of your uninterrupted-sequence needs. Just make sure it contains more rows than your longest sequence.
  • 您可以使用序号来满足所有不间断序列的需求。只要确保它包含比最长序列更多的行。
  • You can use multiple queries on Ordinal for multiple sequences, for example listing every weekday (1-5) for the past seven (1-7) weeks.
  • 您可以对多个序列使用序号上的多个查询,例如列出过去7周的每个工作日(1-5)。
  • You could make it faster by storing dates in your Ordinal table, but it would be less flexible. This way you only need one Ordinal table, no matter how many times you use it. Still, if the speed is worth it, try the INSERT INTO ... SELECT syntax.
  • 您可以通过在序号表中存储日期来更快地实现它,但是它的灵活性要小一些。这样你只需要一个顺序表,不管你用了多少次。不过,如果速度值得的话,试试插入到……选择语法。

#7


1  

I hope you will figure out the rest.

我希望你能找出其余的。

select  * from (
select date_add('2003-01-01 00:00:00.000', INTERVAL n5.num*10000+n4.num*1000+n3.num*100+n2.num*10+n1.num DAY ) as date from
(select 0 as num
   union all select 1
   union all select 2
   union all select 3
   union all select 4
   union all select 5
   union all select 6
   union all select 7
   union all select 8
   union all select 9) n1,
(select 0 as num
   union all select 1
   union all select 2
   union all select 3
   union all select 4
   union all select 5
   union all select 6
   union all select 7
   union all select 8
   union all select 9) n2,
(select 0 as num
   union all select 1
   union all select 2
   union all select 3
   union all select 4
   union all select 5
   union all select 6
   union all select 7
   union all select 8
   union all select 9) n3,
(select 0 as num
   union all select 1
   union all select 2
   union all select 3
   union all select 4
   union all select 5
   union all select 6
   union all select 7
   union all select 8
   union all select 9) n4,
(select 0 as num
   union all select 1
   union all select 2
   union all select 3
   union all select 4
   union all select 5
   union all select 6
   union all select 7
   union all select 8
   union all select 9) n5
) a
where date >'2011-01-02 00:00:00.000' and date < NOW()
order by date

With

select n3.num*100+n2.num*10+n1.num as date

you will get a column with numbers from 0 to max(n3)*100+max(n2)*10+max(n1)

你将得到一列数字从0到max(n3)*100+max(n2)*10+max(n1)

Since here we have max n3 as 3, SELECT will return 399, plus 0 -> 400 records (dates in calendar).

因为这里我们有max n3作为3,SELECT将返回399,加上0 - >400记录(日历中的日期)。

You can tune your dynamic calendar by limiting it, for example, from min(date) you have to now().

您可以通过限制动态日历(例如,从min(date)到now()进行优化。

#8


0  

Use some Perl module to do date calculations, like recommended DateTime or Time::Piece (core from 5.10). Just increment date and print date and 0 until date will match current.

使用一些Perl模块进行日期计算,比如推荐的DateTime或Time::Piece (core from 5.10)。只增加日期并打印日期,直到日期匹配当前为止。

#9


-1  

I don't know if this would work, but how about if you created a new table which contained all the possible dates (that might be the problem with this idea, if the range of dates is going to change unpredictably...) and then do a left join on the two tables? I guess it's a crazy solution if there are a vast number of possible dates, or no way to predict the first and last date, but if the range of dates is either fixed or easy to work out, then this might work.

我不知道这是否可行,但如果您创建了一个包含所有可能日期的新表(这可能是这个想法的问题,如果日期的范围将不可预测地改变……),然后在两个表上执行左连接,会怎么样呢?我想这是一个疯狂的解决方案,如果有大量的可能的日期,或者没有办法预测第一个和最后一个日期,但是如果日期的范围是固定的或者很容易算出来,那么这可能是可行的。

#1


20  

When you need something like that on server side, you usually create a table which contains all possible dates between two points in time, and then left join this table with query results. Something like this:

当您在服务器端需要类似的东西时,通常会创建一个包含两个时间点之间所有可能的日期的表,然后将查询结果留给join这个表。是这样的:

create procedure sp1(d1 date, d2 date)
  declare d datetime;

  create temporary table foo (d date not null);

  set d = d1
  while d <= d2 do
    insert into foo (d) values (d)
    set d = date_add(d, interval 1 day)
  end while

  select foo.d, count(date)
  from foo left join table on foo.d = table.date
  group by foo.d order by foo.d asc;

  drop temporary table foo;
end procedure

In this particular case it would be better to put a little check on the client side, if current date is not previos+1, put some addition strings.

在这种情况下,最好在客户端进行一点检查,如果当前日期不是previos+1,则添加一些附加字符串。

#2


7  

When I had to deal with this problem, to fill in missing dates I actually created a reference table that just contained all dates I'm interested in and joined the data table on the date field. It's crude, but it works.

当我不得不处理这个问题时,为了填写丢失的日期,我实际上创建了一个引用表,其中只包含我感兴趣的所有日期,并加入了date字段的数据表。它很粗糙,但很管用。

SELECT DATE(r.date),count(d.date) 
FROM dates AS r 
LEFT JOIN table AS d ON d.date = r.date 
GROUP BY DATE(r.date) 
ORDER BY r.date ASC;

As for output, I'd just use SELECT INTO OUTFILE instead of generating the CSV by hand. Leaves us free from worrying about escaping special characters as well.

至于输出,我将使用SELECT INTO OUTFILE而不是手工生成CSV。也让我们不用担心要逃避特殊角色。

#3


4  

not dumb, this isn't something that MySQL does, inserting the empty date values. I do this in perl with a two-step process. First, load all of the data from the query into a hash organised by date. Then, I create a Date::EzDate object and increment it by day, so...

不是哑的,这不是MySQL做的,插入空的日期值。我在perl中使用了两个步骤的过程。首先,将查询中的所有数据加载到按日期组织的散列中。然后,我创建一个日期:EzDate对象并逐日递增,因此…

my $current_date = Date::EzDate->new();
$current_date->{'default'} = '{YEAR}-{MONTH NUMBER BASE 1}-{DAY OF MONTH}';
while ($current_date <= $final_date)
{
    print "$current_date\t|\t%hash_o_data{$current_date}";  # EzDate provides for     automatic stringification in the format specfied in 'default'
    $current_date++;
}

where final date is another EzDate object or a string containing the end of your date range.

最后日期是另一个EzDate对象或包含您的日期范围结束的字符串。

EzDate isn't on CPAN right now, but you can probably find another perl mod that will do date compares and provide a date incrementor.

EzDate现在不在CPAN上,但是您可能会找到另一个perl mod,它会对日期进行比较,并提供一个日期增量。

#4


4  

You could use a DateTime object:

您可以使用DateTime对象:

use DateTime;
my $dt;

while ( my ($date, $sum) = $sth->fetchrow )  {
    if (defined $dt) {
        print CSV $dt->ymd . ",0\n" while $dt->add(days => 1)->ymd lt $date;
    }
    else {
        my ($y, $m, $d) = split /-/, $date;
        $dt = DateTime->new(year => $y, month => $m, day => $d);
    }
    print CSV, "$date,$sum\n";
}

What the above code does is it keeps the last printed date stored in a DateTime object $dt, and when the current date is more than one day in the future, it increments $dt by one day (and prints it a line to CSV) until it is the same as the current date.

上面的代码所做的是使最后打印日期存储在一个DateTime对象$ dt,当当前日期超过一天在不久的将来,它的增量dt美元一天(CSV)并打印一行,直到当前日期是一样的。

This way you don't need extra tables, and don't need to fetch all your rows in advance.

这样,您就不需要额外的表,也不需要提前获取所有的行。

#5


1  

Since you don't know where the gaps are, and yet you want all the values (presumably) from the first date in your list to the last one, do something like:

因为你不知道差距在哪里,但是你想要列表中从第一个日期到最后一个日期的所有值(大概),做如下的事情:

use DateTime;
use DateTime::Format::Strptime;
my @row = $sth->fetchrow;
my $countdate = strptime("%Y-%m-%d", $firstrow[0]);
my $thisdate = strptime("%Y-%m-%d", $firstrow[0]);

while ($countdate) {
  # keep looping countdate until it hits the next db row date
  if(DateTime->compare($countdate, $thisdate) == -1) {
    # counter not reached next date yet
    print CSV $countdate->ymd . ",0\n";
    $countdate = $countdate->add( days => 1 );
    $next;
  }

  # countdate is equal to next row's date, so print that instead
  print CSV $thisdate->ymd . ",$row[1]\n";

  # increase both
  @row = $sth->fetchrow;
  $thisdate = strptime("%Y-%m-%d", $firstrow[0]);
  $countdate = $countdate->add( days => 1 );
}

Hmm, that turned out to be more complicated than I thought it would be.. I hope it makes sense!

嗯,结果比我想象的要复杂得多。我希望这是有道理的!

#6


1  

I think the simplest general solution to the problem would be to create an Ordinal table with the highest number of rows that you need (in your case 31*3 = 93).

我认为最简单的解决方法是创建一个序数表,其中包含您需要的最多行数(在您的例子中是31*3 = 93)。

CREATE TABLE IF NOT EXISTS `Ordinal` (
  `n` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`n`)
);
INSERT INTO `Ordinal` (`n`)
VALUES (NULL), (NULL), (NULL); #etc

Next, do a LEFT JOIN from Ordinal onto your data. Here's a simple case, getting every day in the last week:

接下来,从序数到数据的左连接。这是一个简单的例子,在上一周的每一天:

SELECT CURDATE() - INTERVAL `n` DAY AS `day`
FROM `Ordinal` WHERE `n` <= 7
ORDER BY `n` ASC

The two things you would need to change about this are the starting point and the interval. I have used SET @var = 'value' syntax for clarity.

需要改变的两点是起始点和区间。为了清晰,我使用了SET @var = 'value'语法。

SET @end = CURDATE() - INTERVAL DAY(CURDATE()) DAY;
SET @begin = @end - INTERVAL 3 MONTH;
SET @period = DATEDIFF(@end, @begin);

SELECT @begin + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal` WHERE `n` < @period
ORDER BY `n` ASC;

So the final code would look something like this, if you were joining to get the number of messages per day over the last three months:

所以最终的代码会是这样的,如果你加入进来在过去的三个月里每天收到的信息数量:

SELECT COUNT(`msg`.`id`) AS `message_count`, `ord`.`date` FROM (
    SELECT ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH) + INTERVAL (`n` + 1) DAY AS `date`
    FROM `Ordinal`
    WHERE `n` < (DATEDIFF((CURDATE() - INTERVAL DAY(CURDATE()) DAY), ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH)))
    ORDER BY `n` ASC
) AS `ord`
LEFT JOIN `Message` AS `msg`
  ON `ord`.`date` = `msg`.`date`
GROUP BY `ord`.`date`

Tips and Comments:

提示和注释:

  • Probably the hardest part of your query was determining the number of days to use when limiting Ordinal. By comparison, transforming that integer sequence into dates was easy.
  • 可能您的查询中最困难的部分是确定在限制序号时使用的天数。相比之下,将这个整数序列转换成日期很容易。
  • You can use Ordinal for all of your uninterrupted-sequence needs. Just make sure it contains more rows than your longest sequence.
  • 您可以使用序号来满足所有不间断序列的需求。只要确保它包含比最长序列更多的行。
  • You can use multiple queries on Ordinal for multiple sequences, for example listing every weekday (1-5) for the past seven (1-7) weeks.
  • 您可以对多个序列使用序号上的多个查询,例如列出过去7周的每个工作日(1-5)。
  • You could make it faster by storing dates in your Ordinal table, but it would be less flexible. This way you only need one Ordinal table, no matter how many times you use it. Still, if the speed is worth it, try the INSERT INTO ... SELECT syntax.
  • 您可以通过在序号表中存储日期来更快地实现它,但是它的灵活性要小一些。这样你只需要一个顺序表,不管你用了多少次。不过,如果速度值得的话,试试插入到……选择语法。

#7


1  

I hope you will figure out the rest.

我希望你能找出其余的。

select  * from (
select date_add('2003-01-01 00:00:00.000', INTERVAL n5.num*10000+n4.num*1000+n3.num*100+n2.num*10+n1.num DAY ) as date from
(select 0 as num
   union all select 1
   union all select 2
   union all select 3
   union all select 4
   union all select 5
   union all select 6
   union all select 7
   union all select 8
   union all select 9) n1,
(select 0 as num
   union all select 1
   union all select 2
   union all select 3
   union all select 4
   union all select 5
   union all select 6
   union all select 7
   union all select 8
   union all select 9) n2,
(select 0 as num
   union all select 1
   union all select 2
   union all select 3
   union all select 4
   union all select 5
   union all select 6
   union all select 7
   union all select 8
   union all select 9) n3,
(select 0 as num
   union all select 1
   union all select 2
   union all select 3
   union all select 4
   union all select 5
   union all select 6
   union all select 7
   union all select 8
   union all select 9) n4,
(select 0 as num
   union all select 1
   union all select 2
   union all select 3
   union all select 4
   union all select 5
   union all select 6
   union all select 7
   union all select 8
   union all select 9) n5
) a
where date >'2011-01-02 00:00:00.000' and date < NOW()
order by date

With

select n3.num*100+n2.num*10+n1.num as date

you will get a column with numbers from 0 to max(n3)*100+max(n2)*10+max(n1)

你将得到一列数字从0到max(n3)*100+max(n2)*10+max(n1)

Since here we have max n3 as 3, SELECT will return 399, plus 0 -> 400 records (dates in calendar).

因为这里我们有max n3作为3,SELECT将返回399,加上0 - >400记录(日历中的日期)。

You can tune your dynamic calendar by limiting it, for example, from min(date) you have to now().

您可以通过限制动态日历(例如,从min(date)到now()进行优化。

#8


0  

Use some Perl module to do date calculations, like recommended DateTime or Time::Piece (core from 5.10). Just increment date and print date and 0 until date will match current.

使用一些Perl模块进行日期计算,比如推荐的DateTime或Time::Piece (core from 5.10)。只增加日期并打印日期,直到日期匹配当前为止。

#9


-1  

I don't know if this would work, but how about if you created a new table which contained all the possible dates (that might be the problem with this idea, if the range of dates is going to change unpredictably...) and then do a left join on the two tables? I guess it's a crazy solution if there are a vast number of possible dates, or no way to predict the first and last date, but if the range of dates is either fixed or easy to work out, then this might work.

我不知道这是否可行,但如果您创建了一个包含所有可能日期的新表(这可能是这个想法的问题,如果日期的范围将不可预测地改变……),然后在两个表上执行左连接,会怎么样呢?我想这是一个疯狂的解决方案,如果有大量的可能的日期,或者没有办法预测第一个和最后一个日期,但是如果日期的范围是固定的或者很容易算出来,那么这可能是可行的。