node.js拼接速度太慢，超过70000个项目

I am newbie in node.js.

我是node.js的新手。

i tried to insert 70000 items into array , and then delete all of them:

我试图将70000个项目插入数组,然后删除所有项目:

var Stopwatch = require("node-stopwatch").Stopwatch;
var stopwatch = Stopwatch.create();


var a = []
stopwatch.start();

for (var i = 1 ; i < 70000 ; i++){
    a.push((parseInt(Math.random() * 10000)) + "test");
}

for (var i = 1 ; i < 70000 ; i++){
    a.splice(0,1);
}

stopwatch.stop();

console.log("End: " + stopwatch.elapsedMilliseconds + " : " + a.length);

It works fine and output is:

它工作正常,输出是:

PS C:\Users\Documents\VSCode> node test.js
End: 51 : 0

But when i increase number of items to 72000 , it takes too much time to end:

但是当我将项目数量增加到72000时,结束需要太多时间:

var Stopwatch = require("node-stopwatch").Stopwatch;
var stopwatch = Stopwatch.create();


var a = []
stopwatch.start();

for (var i = 1 ; i < 72000 ; i++){
    a.push((parseInt(Math.random() * 10000)) + "test");
}

for (var i = 1 ; i < 72000 ; i++){
    a.splice(0,1);
}

stopwatch.stop();

console.log("End: " + stopwatch.elapsedMilliseconds + " : " + a.length);

And the output is:

输出是:

End: 9554 : 0

Why it occurs? only 2000 items added more , but it takes too much time.

为什么会这样?只增加2000件物品,但需要花费太多时间。

Node.js version is: v6.11.3

Node.js版本是:v6.11.3

2 个解决方案

#1

V8 developer here. Removing (or inserting) array elements at the start (at array[0]) is generally very expensive, because all the remaining elements have to be moved. Essentially, what the engine has to do under the hood for every one of these .splice(0, 1) operations is:

V8开发者在这里。在开始时(在数组[0]处)移除(或插入)数组元素通常非常昂贵,因为必须移动所有剩余的元素。从本质上讲,引擎必须为这些.splice(0,1)操作中的每一个进行操作。

for (var j = 0; j < a.length - 1; j++) {
  a[j] = a[j+1];
}
a.length = a.length - 1`

In some cases, V8 can employ a trick under the hood where the beginning of the object is moved instead -- in the fast cases, you can see the amazing speedup that this trick provides. However, for technical reasons, this trick cannot be applied for arrays beyond a certain size. The resulting "slowdown" is actually the "true" speed of this very expensive operation.

在某些情况下,V8可以在引擎盖的底部使用一个技巧 - 在快速的情况下,你可以看到这个技巧提供的惊人的加速。但是,由于技术原因,此技巧不能应用于超出特定大小的阵列。由此导致的“减速”实际上是这种非常昂贵的操作的“真实”速度。

If you want to delete array elements fast, delete them from the end (at array[array.length - 1]), e.g. using Array.pop(). If you want to delete all elements in one go, just set array.length = 0. If you need fast FIFO/"queue" semantics, consider taking inspiration from ring buffers: have a "cursor" for the next element to be read/returned, and only shrink the array when there's a big chunk of elements to be freed up. Roughly:

如果要快速删除数组元素,请从末尾删除它们(在array [array.length - 1]处),例如使用Array.pop()。如果你想一次删除所有元素,只需设置array.length = 0.如果你需要快速的FIFO /“队列”语义,考虑从环形缓冲区中获取灵感:为下一个要读取的元素设置一个“光标”/返回,只有当有大量元素被释放时才收缩数组。大致:

function Queue() {
  this.enqueue = function(x) {
    this.array_.push(x);
  }
  this.dequeue = function() {
    var x = this.array_[this.cursor_++];
    // Free up space if half the array is unused.
    if (this.cursor_ > this.array_.length / 2) {
      this.array_.splice(0, this.cursor_);
      this.cursor_ = 0;
    }
    return x;
  }
  this.array_ = [];
  this.cursor_ = 0;
}

Side note: It doesn't matter here, but for the record, to push 70,000 elements into your array, your loop should start at 0: for (var i = 0; i < 70000; i++) {...}. As written, you're only pushing 69,999 elements.

旁注:这里没关系,但是为了记录,要将70,000个元素推入数组,你的循环应该从0开始:for(var i = 0; i <70000; i ++){...}。如上所述,你只推动了69,999个元素。

Side note 2: Rounding a double to an integer via "parseInt" is pretty slow, because it first formats the double as a string, then reads that string back as an integer. A faster way would be Math.floor(Math.random() * 10000)). (For the purposes of this test, you could also simply push i.)

旁注2:通过“parseInt”将double舍入为整数非常慢,因为它首先将double格式化为字符串,然后将该字符串作为整数读回。更快的方法是Math.floor(Math.random()* 10000))。 (为了这个测试的目的,你也可以简单地推动我。)

#2

Interesting, I did something like

有意思,我做了类似的事情

if (i % 1000 === 0) {
    console.log(i + " " + stopwatch.elapsedMilliseconds + " : " + a.length);
}

inside the 2nd loop. (This is painfull to compute, but it will help diagnose the problem)

在第二个循环内。 (这很难计算,但它有助于诊断问题)

I was not suffer the performance loss. But, I think, that I found why "only" 2000 more things to do is that hard for node. First - my difference: [loop max num, unit, 3 benchmarks results]

我没有遭受性能损失。但是,我认为,我发现为什么“只有”2000更多的事情要做,这对节点来说很难。首先 - 我的区别:[循环最大数量,单位,3个基准测试结果]

70k: [ms] ~26k, ~25.7k, ~26k 72k: [ms] ~25.6k, 27k, 25.7k

70k:[ms] ~26k,~25.7k,~26k 72k:[ms] ~25.6k,27k,25.7k

Okay, as I saw the logging, I seen that, the last 10k records are computed like instant. I think that splice removes 1 item from the front, and then - one by one it shifts array 1 index "to the start", let's change our test to 10 arrays every with 10k records to see if it will be much better. I will do this in the most lazy way:

好的,当我看到日志记录时,我看到,最后的10k记录就像瞬间计算一样。我认为splice从前面移除了1个项目,然后 - 逐个将数组1索引“移动到开始”,让我们将测试改为10个数组,每个数据都有10k记录,看它是否会好得多。我将以最懒惰的方式做到这一点:

var Stopwatch = require("node-stopwatch").Stopwatch;
var stopwatch = Stopwatch.create();

var a1 = [], a2 = [], a3 = [], a4 = [], a5 = [];
var a6 = [], a7 = [], a8 = [], a9 = [], a10 = [];

stopwatch.start();

function fill (arr) {
    for (var i = 1 ; i < 10000 ; i++){
        arr.push((parseInt(Math.random() * 10000)) + "test");
    }
}

fill(a1); fill(a2); fill(a3); fill(a4); fill(a5);
fill(a6); fill(a7); fill(a8); fill(a9); fill(a10);

let removeCount = 0;
function unfill(arr) {
    for (var i = 1 ; i < 10000 ; i++){
        arr.splice(0,1);
        removeCount++;

        if (i % 1000 === 0) {
            console.log(i + " " + stopwatch.elapsedMilliseconds + " : " + arr.length);
        }
    }
}

unfill(a1); unfill(a2); unfill(a3); unfill(a4); unfill(a5);
unfill(a6); unfill(a7); unfill(a8); unfill(a9); unfill(a10);

stopwatch.stop();

console.log("End: " + stopwatch.elapsedMilliseconds + " removeCount " + removeCount);

And, yes... I did not answered why exacly your pc have such performance loss between 70k and 72k records - I belive it is something machine dependant... probably lack of RAM, but don't get it wrong - I don't know.

并且,是的...我没有回答为什么你的电脑在70k到72k的记录之间有这样的性能损失 - 我相信它是依赖于机器的......可能缺少RAM,但是不要错 - 我不是我知道。

I solved how to improve that. The 100k(-10) on 10 arrays execution time was around 73-74 ms. I think, you can write it into 2d array and modify logic to compute length and rest of things as you want.

我解决了如何改进它。 10个阵列上的100k(-10)执行时间约为73-74毫秒。我想,您可以将其写入2d数组并修改逻辑以根据需要计算长度和其余内容。

Thanks for attention.

谢谢你的关注。

#1

for (var j = 0; j < a.length - 1; j++) {
  a[j] = a[j+1];
}
a.length = a.length - 1`

function Queue() {
  this.enqueue = function(x) {
    this.array_.push(x);
  }
  this.dequeue = function() {
    var x = this.array_[this.cursor_++];
    // Free up space if half the array is unused.
    if (this.cursor_ > this.array_.length / 2) {
      this.array_.splice(0, this.cursor_);
      this.cursor_ = 0;
    }
    return x;
  }
  this.array_ = [];
  this.cursor_ = 0;
}