计算剩余下载时间的最佳方法是什么?

时间:2022-04-24 18:11:06

Let's say you want to calculate the remaining download time, and you have all the information needed, that is: File size, dl'ed size, size left, time elapsed, momentary dl speed, etc'. How would you calculate the remaining dl time?

假设您想要计算剩余下载时间,并且您拥有所需的所有信息,即:文件大小,大小,大小,经过的时间,瞬间dl速度等等。你如何计算剩余的dl时间?

Ofcourse, the straightforward way would be either: size left/momentary dl speed, or: (time elapsed/dl'ed size)*size left. Only that the first would be subject to deviations in the momentary speed, and the latter wouldn't adapt well to altering speeds.

当然,直接的方式是:尺寸左/瞬间dl速度,或:(经过的时间/尺寸)*尺寸左。只有第一个会受到瞬时速度偏差的影响,后者不能很好地适应速度的变化。

Must be some smarter way to do that, right? Take a look at the pirated software and music you currently download with uTorrent. It's easy to notice that it does more than the simple calculation mentioned before. Actually, I notices that sometimes when the dl speed drops, the time remaining also drops for a couple of moments until it readjusts.

必须是一些更聪明的方法,对吧?查看您目前使用uTorrent下载的盗版软件和音乐。很容易注意到它比以前提到的简单计算更有用。实际上,我注意到有时当dl速度下降时,剩余的时间也会下降一段时间,直到它重新调整为止。

7 个解决方案

#1


Well, as you said, using the absolutely current download speed isn't a great method, because it tends to fluctuate. However, something like an overall average isn't a great idea either, because there may be large fluctuations there as well.

好吧,正如你所说,使用绝对当前的下载速度并不是一个好方法,因为它往往会波动。然而,总体平均值之类的东西也不是一个好主意,因为那里也可能存在很大的波动。

Consider if I start downloading a file at the same time as 9 others. I'm only getting 10% of my normal speed, but halfway through the file, the other 9 finish. Now I'm downloading at 10x the speed I started at. My original 10% speed shouldn't be a factor in the "how much time is left?" calculation any more.

考虑我是否与其他9个人同时开始下载文件。我只获得正常速度的10%,但是在文件的中途,另外9个完成。现在我的速度是我开始时的10倍。我最初的10%速度不应该是“剩下多少时间?”的一个因素。再计算。

Personally, I'd probably take an average over the last 30 seconds or so, and use that. That should do calculations based on recent speed, without fluctuating wildly. 30 seconds may not be the right amount, it would take some experimentation to figure out a good amount.

就个人而言,我可能会在最后30秒左右取平均值,并使用它。这应该根据最近的速度进行计算,而不会大幅波动。 30秒可能不是正确的数量,需要一些实验来计算出好的数量。

Another option would be to set a sort of "fluctuation threshold", where you don't do any recalculation until the speed changes by more than that threshold. For example (random number, again, would require experimentation), you could set the threshold at 10%. Then, if you're downloading at 100kb/s, you don't recalculate the remaining time until the download speed changes to either below 90kb/s or 110kb/s. If one of those changes happens, the time is recalculated and a new threshold is set.

另一个选择是设置一种“波动阈值”,在速度变化超过该阈值之前,您不会进行任何重新计算。例如(随机数,同样需要实验),您可以将阈值设置为10%。然后,如果您以100kb / s的速度下载,则在下载速度变为低于90kb / s或110kb / s之前,不会重新计算剩余时间。如果发生其中一个更改,则重新计算时间并设置新阈值。

#2


You could use an averaging algorithm where the old values decay linearly. If S_n is the speed at time n and A_{n-1} is the average at time n-1, then define your average speed as follows.

您可以使用平均算法,其中旧值线性衰减。如果S_n是时间n的速度而A_ {n-1}是时间n-1的平均值,则按如下方式定义平均速度。

A_1 = S_1
A_2 = (S_1 + S_2)/2
A_n = S_n/(n-1) + A_{n-1}(1-1/(n-1))

A_1 = S_1 A_2 =(S_1 + S_2)/ 2 A_n = S_n /(n-1)+ A_ {n-1}(1-1 /(n-1))

In English, this means that the longer in the past a measurement occurred, the less it matters because its importance has decayed.

在英语中,这意味着过去测量发生的时间越长,重要性就越小,因为它的重要性已经衰退。

Compare this to the normal averaging algorithm: A_n = S_n/n + A_{n-1}(1-1/n)

将其与常规平均算法进行比较:A_n = S_n / n + A_ {n-1}(1-1 / n)

You could also have it geometrically decay, which would weight the most recent speeds very heavily: A_n = S_n/2 + A_{n-1}/2

你也可以让它几何衰减,这会对最近的速度产生很大的影响:A_n = S_n / 2 + A_ {n-1} / 2

If the speeds are 4,3,5,6 then A_4 = 4.5 (simple average)
A_4 = 4.75 (linear decay)
A_4 = 5.125 (geometric decay)

如果速度为4,3,5,6,则A_4 = 4.5(简单平均值)A_4 = 4.75(线性衰减)A_4 = 5.125(几何衰减)

Example in PHP

Beware that $n+1 (not $n) is the number of current data points due to PHP's arrays being zero-indexed. To match the above example set n == $n+1 or n-1 == $n

请注意$ n + 1(不是$ n)是由于PHP的数组被零索引而导致的当前数据点数。为了匹配上面的例子,设置n == $ n + 1或n-1 == $ n

<?php

$s = [4,3,5,6];

// average
$a = [];
for ($n = 0; $n < count($s); ++$n)
{
    if ($n == 0)
        $a[$n] = $s[$n];
    else
    {
        // $n+1 = number of data points so far
        $weight = 1/($n+1);

        $a[$n] = $s[$n] * $weight + $a[$n-1] * (1 - $weight);
    }
}

var_dump($a);


// linear decay
$a = [];
for ($n = 0; $n < count($s); ++$n)
{
    if ($n == 0)
        $a[$n] = $s[$n];

    elseif ($n == 1)
        $a[$n] = ($s[$n] + $s[$n-1]) / 2;

    else
    {
        // $n = number of data points so far - 1
        $weight = 1/($n);

        $a[$n] = $s[$n] * $weight + $a[$n-1] * (1 - $weight);
    }
}

var_dump($a);


// geometric decay
$a = [];
for ($n = 0; $n < count($s); ++$n)
{
    if ($n == 0)
        $a[$n] = $s[$n];
    else
    {
        $weight = 1/2;

        $a[$n] = $s[$n] * $weight + $a[$n-1] * (1 - $weight);
    }
}

var_dump($a);

Output

array (size=4)
  0 => int 4
  1 => float 3.5
  2 => float 4
  3 => float 4.5

array (size=4)
  0 => int 4
  1 => float 3.5
  2 => float 4.25
  3 => float 4.8333333333333

array (size=4)
  0 => int 4
  1 => float 3.5
  2 => float 4.25
  3 => float 5.125

#3


The obvious way would be something in between, you need a 'moving average' of the download speed.

显而易见的方法是介于两者之间,需要下载速度的“移动平均”。

#4


I think it's just an averaging algorithm. It averages the rate over a few seconds.

我认为这只是一个平均算法。它的平均速度超过几秒钟。

#5


What you could do also is keep track of your average speed and show a calculation of that as well.

您还可以做的是跟踪您的平均速度并显示其计算结果。

#6


EDIT: Here's what I finally suggest, I tried it and it provides quite satisfying results:

编辑:这是我最后的建议,我尝试了它,它提供了非常令人满意的结果:

I have a zero initialized array for each download speed between 0 - 500 kB/s (could be higher if you expect such speeds) in 1 kB/s steps. I sample the download speed momentarily (every second is a good interval), and increment the coresponding array item by one. Now I know how many seconds I have spent downloading the file at each speed. The sum of all these values is the elapsed time (in seconds). The sum of these values multiplied by the corresponding speed is the size downloaded so far. If I take the ratio between each value in the array and the elapsed time, assuming the speed variation pattern stabalizes, I can form a formula to predict the time each size will take. This size in this case, is the size remaining. That's what I do: I take the sum of each array item value multiplied by the corresponding speed (the index) and divided by the elapsed time. Then I divide the size left by this value, and that's the time left.

我有一个零初始化数组,每个下载速度在0 - 500 kB / s之间(如果你期望这样的速度,可能会更高),步长为1 kB / s。我暂时采样下载速度(每秒都是一个很好的间隔),并将相应的数组项增加1。现在我知道每个速度下载文件花了多少秒。所有这些值的总和是经过的时间(以秒为单位)。这些值的总和乘以相应的速度是到目前为止下载的大小。如果我取数组中每个值与经过时间之间的比率,假设速度变化模式稳定,我可以形成一个公式来预测每个尺寸所需的时间。在这种情况下,此大小是剩余的大小。这就是我所做的:我将每个数组项目值的总和乘以相应的速度(索引)并除以经过的时间。然后我将该值除以左边的大小,这就是剩下的时间。

Takes a few seconds to stabalize, and then works preety damn well.

需要几秒钟才能稳定,然后工作得非常好。

Note that this is a "complicated" average, so the method of discarding older values (moving average) might improve it even further.

请注意,这是一个“复杂”的平均值,因此丢弃旧值(移动平均值)的方法可能会进一步改善它。

#7


For anyone interested, I wrote an open-source library in C# called Progression that has a "moving-average" implementation: ETACalculator.cs.

对于任何感兴趣的人,我在C#中编写了一个名为Progression的开源库,它具有“移动平均”实现:ETACalculator.cs。

The Progression library defines an easy-to-use structure for reporting several types of progress. It also easily handles nested progress for very smooth progress reporting.

Progression库定义了一种易于使用的结构,用于报告几种类型的进度。它还可以轻松处理嵌套进度,以实现非常流畅的进度报告。

#1


Well, as you said, using the absolutely current download speed isn't a great method, because it tends to fluctuate. However, something like an overall average isn't a great idea either, because there may be large fluctuations there as well.

好吧,正如你所说,使用绝对当前的下载速度并不是一个好方法,因为它往往会波动。然而,总体平均值之类的东西也不是一个好主意,因为那里也可能存在很大的波动。

Consider if I start downloading a file at the same time as 9 others. I'm only getting 10% of my normal speed, but halfway through the file, the other 9 finish. Now I'm downloading at 10x the speed I started at. My original 10% speed shouldn't be a factor in the "how much time is left?" calculation any more.

考虑我是否与其他9个人同时开始下载文件。我只获得正常速度的10%,但是在文件的中途,另外9个完成。现在我的速度是我开始时的10倍。我最初的10%速度不应该是“剩下多少时间?”的一个因素。再计算。

Personally, I'd probably take an average over the last 30 seconds or so, and use that. That should do calculations based on recent speed, without fluctuating wildly. 30 seconds may not be the right amount, it would take some experimentation to figure out a good amount.

就个人而言,我可能会在最后30秒左右取平均值,并使用它。这应该根据最近的速度进行计算,而不会大幅波动。 30秒可能不是正确的数量,需要一些实验来计算出好的数量。

Another option would be to set a sort of "fluctuation threshold", where you don't do any recalculation until the speed changes by more than that threshold. For example (random number, again, would require experimentation), you could set the threshold at 10%. Then, if you're downloading at 100kb/s, you don't recalculate the remaining time until the download speed changes to either below 90kb/s or 110kb/s. If one of those changes happens, the time is recalculated and a new threshold is set.

另一个选择是设置一种“波动阈值”,在速度变化超过该阈值之前,您不会进行任何重新计算。例如(随机数,同样需要实验),您可以将阈值设置为10%。然后,如果您以100kb / s的速度下载,则在下载速度变为低于90kb / s或110kb / s之前,不会重新计算剩余时间。如果发生其中一个更改,则重新计算时间并设置新阈值。

#2


You could use an averaging algorithm where the old values decay linearly. If S_n is the speed at time n and A_{n-1} is the average at time n-1, then define your average speed as follows.

您可以使用平均算法,其中旧值线性衰减。如果S_n是时间n的速度而A_ {n-1}是时间n-1的平均值,则按如下方式定义平均速度。

A_1 = S_1
A_2 = (S_1 + S_2)/2
A_n = S_n/(n-1) + A_{n-1}(1-1/(n-1))

A_1 = S_1 A_2 =(S_1 + S_2)/ 2 A_n = S_n /(n-1)+ A_ {n-1}(1-1 /(n-1))

In English, this means that the longer in the past a measurement occurred, the less it matters because its importance has decayed.

在英语中,这意味着过去测量发生的时间越长,重要性就越小,因为它的重要性已经衰退。

Compare this to the normal averaging algorithm: A_n = S_n/n + A_{n-1}(1-1/n)

将其与常规平均算法进行比较:A_n = S_n / n + A_ {n-1}(1-1 / n)

You could also have it geometrically decay, which would weight the most recent speeds very heavily: A_n = S_n/2 + A_{n-1}/2

你也可以让它几何衰减,这会对最近的速度产生很大的影响:A_n = S_n / 2 + A_ {n-1} / 2

If the speeds are 4,3,5,6 then A_4 = 4.5 (simple average)
A_4 = 4.75 (linear decay)
A_4 = 5.125 (geometric decay)

如果速度为4,3,5,6,则A_4 = 4.5(简单平均值)A_4 = 4.75(线性衰减)A_4 = 5.125(几何衰减)

Example in PHP

Beware that $n+1 (not $n) is the number of current data points due to PHP's arrays being zero-indexed. To match the above example set n == $n+1 or n-1 == $n

请注意$ n + 1(不是$ n)是由于PHP的数组被零索引而导致的当前数据点数。为了匹配上面的例子,设置n == $ n + 1或n-1 == $ n

<?php

$s = [4,3,5,6];

// average
$a = [];
for ($n = 0; $n < count($s); ++$n)
{
    if ($n == 0)
        $a[$n] = $s[$n];
    else
    {
        // $n+1 = number of data points so far
        $weight = 1/($n+1);

        $a[$n] = $s[$n] * $weight + $a[$n-1] * (1 - $weight);
    }
}

var_dump($a);


// linear decay
$a = [];
for ($n = 0; $n < count($s); ++$n)
{
    if ($n == 0)
        $a[$n] = $s[$n];

    elseif ($n == 1)
        $a[$n] = ($s[$n] + $s[$n-1]) / 2;

    else
    {
        // $n = number of data points so far - 1
        $weight = 1/($n);

        $a[$n] = $s[$n] * $weight + $a[$n-1] * (1 - $weight);
    }
}

var_dump($a);


// geometric decay
$a = [];
for ($n = 0; $n < count($s); ++$n)
{
    if ($n == 0)
        $a[$n] = $s[$n];
    else
    {
        $weight = 1/2;

        $a[$n] = $s[$n] * $weight + $a[$n-1] * (1 - $weight);
    }
}

var_dump($a);

Output

array (size=4)
  0 => int 4
  1 => float 3.5
  2 => float 4
  3 => float 4.5

array (size=4)
  0 => int 4
  1 => float 3.5
  2 => float 4.25
  3 => float 4.8333333333333

array (size=4)
  0 => int 4
  1 => float 3.5
  2 => float 4.25
  3 => float 5.125

#3


The obvious way would be something in between, you need a 'moving average' of the download speed.

显而易见的方法是介于两者之间,需要下载速度的“移动平均”。

#4


I think it's just an averaging algorithm. It averages the rate over a few seconds.

我认为这只是一个平均算法。它的平均速度超过几秒钟。

#5


What you could do also is keep track of your average speed and show a calculation of that as well.

您还可以做的是跟踪您的平均速度并显示其计算结果。

#6


EDIT: Here's what I finally suggest, I tried it and it provides quite satisfying results:

编辑:这是我最后的建议,我尝试了它,它提供了非常令人满意的结果:

I have a zero initialized array for each download speed between 0 - 500 kB/s (could be higher if you expect such speeds) in 1 kB/s steps. I sample the download speed momentarily (every second is a good interval), and increment the coresponding array item by one. Now I know how many seconds I have spent downloading the file at each speed. The sum of all these values is the elapsed time (in seconds). The sum of these values multiplied by the corresponding speed is the size downloaded so far. If I take the ratio between each value in the array and the elapsed time, assuming the speed variation pattern stabalizes, I can form a formula to predict the time each size will take. This size in this case, is the size remaining. That's what I do: I take the sum of each array item value multiplied by the corresponding speed (the index) and divided by the elapsed time. Then I divide the size left by this value, and that's the time left.

我有一个零初始化数组,每个下载速度在0 - 500 kB / s之间(如果你期望这样的速度,可能会更高),步长为1 kB / s。我暂时采样下载速度(每秒都是一个很好的间隔),并将相应的数组项增加1。现在我知道每个速度下载文件花了多少秒。所有这些值的总和是经过的时间(以秒为单位)。这些值的总和乘以相应的速度是到目前为止下载的大小。如果我取数组中每个值与经过时间之间的比率,假设速度变化模式稳定,我可以形成一个公式来预测每个尺寸所需的时间。在这种情况下,此大小是剩余的大小。这就是我所做的:我将每个数组项目值的总和乘以相应的速度(索引)并除以经过的时间。然后我将该值除以左边的大小,这就是剩下的时间。

Takes a few seconds to stabalize, and then works preety damn well.

需要几秒钟才能稳定,然后工作得非常好。

Note that this is a "complicated" average, so the method of discarding older values (moving average) might improve it even further.

请注意,这是一个“复杂”的平均值,因此丢弃旧值(移动平均值)的方法可能会进一步改善它。

#7


For anyone interested, I wrote an open-source library in C# called Progression that has a "moving-average" implementation: ETACalculator.cs.

对于任何感兴趣的人,我在C#中编写了一个名为Progression的开源库,它具有“移动平均”实现:ETACalculator.cs。

The Progression library defines an easy-to-use structure for reporting several types of progress. It also easily handles nested progress for very smooth progress reporting.

Progression库定义了一种易于使用的结构,用于报告几种类型的进度。它还可以轻松处理嵌套进度,以实现非常流畅的进度报告。