为什么数字数组,比对象数组更快的数据排序,Javascript中的数据更少?

时间:2022-09-18 17:09:09

For my application in node.js I must sort elements of an array in descending order based on some numeric value (i.e. a numeric rank). Since my application is performance-critical, I decided to build my data structure so that sorting is optimized. I hypothesized that the fewer data contained per element in my array, the faster the sorts will be. To test my hypothesis, I ran the following on three different arrays of length 10000:

用于我在node中的应用程序。我必须根据数值(例如数值等级)按降序对数组中的元素进行排序。由于我的应用程序是性能关键型的,所以我决定构建我的数据结构,以便对排序进行优化。我假设数组中每个元素包含的数据越少,排序就越快。为了验证我的假设,我对三个长度为10000的不同数组进行如下操作:

EDIT: Guys, it seems as though there was something flawed with my original testing. The first test takes significantly longer than the ones that follow. As such, I have modified my test code to have a 'buffer' sort before the actual sorts. Furthermore, I rotated the order of my tests for a fixed number of trials to reduce any bias that might result from the ordering of the tests themselves. I've modified the results accordingly.

编辑:伙计们,我原来的测试好像有问题。第一个测试所花的时间比接下来的要长得多。因此,我修改了测试代码,使其在实际排序之前具有“缓冲区”排序。此外,我将我的测试顺序旋转为固定的试验数量,以减少由于测试本身的顺序而产生的任何偏差。我已经相应地修改了结果。

Full source here: https://raw.githubusercontent.com/youngrrrr/js-array-sort-bench-test/master/arraySortTest.js

全部来源:https://raw.githubusercontent.com/youngrrrr/js-array-sort-bench-test/master/arraySortTest.js

var buffer = [781197, ... ];
var sparseArray = [781197, ... ];
var sparseArray2 = [{'a' : 781197}, ...];
var denseArray = [{'a' : 781197, 'b': ['r', 'a', 'n', 'd', 'o', 'm'] }, ...];

/* buffer : for some reason, the first test always takes significantly longer than the others. I've added this to try to remove whatever bias there was before... */
console.time('buffer');
random.sort(compareSparse);
console.timeEnd('buffer');
console.log(buffer[0]); // prints "58"


/* sparseArray : an array whose elements are numbers */
console.time('sparse');
sparseArray.sort(compareSparse);
console.timeEnd('sparse');
console.log(sparseArray[0]); // prints "58"

/* sparseArray2 (not an accurate name, just got lazy) :
   an array whose elements are objects with a single key-value pair mapping
   an arbitrary name 'a' to a number (which we sort on) */
console.time('sparse2');
sparseArray2.sort(compareDense);
console.timeEnd('sparse2');
console.log(sparseArray2[0]); // prints "{ a: 58 }"

/* denseArray : an array whose elements are objects with two key-value
   pairs mapping an arbitrary key 'a' to a number (which we sort on) and
   another arbitrary key 'b' to an array (which is just supposed to be 
   extra data for the purpose of my hypothesis) */
console.time('dense');
denseArray.sort(compareDense);
console.timeEnd('dense');
console.log(denseArray[0]); // prints "{ a: 58, b: [ 'r', 'a', 'n', 'd', 'o', 'm' ] }"

function compareSparse(a, b) {
    if (a < b) {
        return -1;
    } else if (a > b) {
        return 1;   }
    else {
        return 0;
    }
}

function compareDense(a, b) {
    if (a.a < b.a) {
            return -1;
        } else if (a.a > b.a) {
            return 1;   }
        else {
            return 0;
        }
    }
}

Old test:

旧的测试:

After 25 trials (I know, small sample size but I did this all manually) I got the following times for average sort time:

经过25次试验(我知道,样本量很小,但我都是手工做的),平均排序时间如下:

  • sparseArray: (24 + 23 + 21 + 23 + 21 + 22 + 22 + 22 + 22 + 22 + 21 + 20 + 22 + 24 + 24 + 21 + 22 + 22 + 25 + 23 + 24 + 23 + 21 + 21 + 23) / 25 = 22.32ms
  • sparseArray:(24 + 23 + 21 + 23 + 21 + 22 + 22 + 22 + 22 + 22 + 21 + 20 + 22 + 24 + 24 + 21 + 22 + 22 + 25 + 23 + 24 + 23 + 21 + 21 + 23)/ 25 = 22.32毫秒
  • sparseArray2: (4 + 4 + 4 + 4 + 4 + 5 + 5 + 5 + 5 + 4 + 6 + 5 + 5 + 4 + 5 + 4 + 4 + 4 + 5 + 6 + 4 + 5 + 4 + 4 + 5) / 25 = 4.56ms
  • sparseArray2:(4 + 4 + 4 + 4 + 4 + 5 + 5 + 5 + 5 + 4 + 6 + 5 + 5 + 4 + 5 + 4 + 4 + 4 + 5 + 6 + 4 + 5 + 4 + 4 + 5)/ 25 = 4.56毫秒
  • denseArray: (5 + 5 + 4 + 5 + 5 + 5 + 5 + 5 + 5 + 6 + 5 + 5 + 4 + 4 + 5 + 5 + 5 + 4 + 5 + 5 + 6 + 5 + 5 + 5 + 4) / 25 = 4.88ms
  • denseArray:(5 + 5 + 4 + 5 + 5 + 5 + 5 + 5 + 5 + 6 + 5 + 5 + 4 + 4 + 5 + 5 + 5 + 4 + 5 + 5 + 6 + 5 + 5 + 5 + 4)/ 25 = 4.88毫秒

New test:

新的测试:

After 25 trials (I know, small sample size but I did this all manually) I got the following times for average sort time:

经过25次试验(我知道,样本量很小,但我都是手工做的),平均排序时间如下:

  • sparseArray: (4+4+4+4+3+4+4+4+4+4+4+4+3+4+4)/15 = 3.867ms
  • sparseArray:(4 + 4 + 4 + 4 + 3 + 4 + 4 + 4 + 4 + 4 + 4 + 4 + 3 + 4 + 4)/ 15 = 3.867毫秒
  • sparseArray2: (4+4+4+6+5+4+4+4+4+5+5+4+5+5+5)/15 = 4.533ms
  • sparseArray2:(4 + 4 + 4 + 6 + 5 + 4 + 4 + 4 + 4 + 5 + 5 + 4 + 5 + 5 + 5)/ 15 = 4.533毫秒
  • denseArray: (4+4+4+5+5+4+4+4+4+5+5+4+5+5+5)/15 = 4.466ms
  • denseArray:(4 + 4 + 4 + 5 + 5 + 4 + 4 + 4 + 4 + 5 + 5 + 4 + 5 + 5 + 5)/ 15 = 4.466毫秒

So I've come to the following conclusions:

因此,我得出以下结论:

  • Arrays of numbers sort faster than arrays of objects whose values are numbers. This makes sense intuitively.
  • 数组的数量排序要比那些值为数字的对象数组要快。这是有道理的直觉。
  • For some reason, and paradoxically so, more data in a particular element results in faster sorting than less data (as evidenced by sparseArray2 vs denseArray runtimes).
  • 出于某种原因,而且自相矛盾的是,一个特定元素中的数据越多,排序速度就越快(sparseArray2 vs denseArray运行时)。

What I want to know is:

我想知道的是:

  • Are these conclusions backed by any documentation/something other than my testing? That is, did I reach the correct conclusions?
  • 除了我的测试之外,这些结论是否有任何文档支持?也就是说,我得出了正确的结论吗?
  • And why? Why do arrays of numbers sort faster than arrays of objects (makes sense intuitively, but what's the explanation behind this, if any)? Not only this, but why do arrays containing MORE data seem to sort faster than those containing less data?
  • ,为什么?为什么数字数组排序速度比对象数组快(直觉上讲得通,但如果有的话,这背后的解释是什么)?不仅如此,为什么包含更多数据的数组比包含更少数据的数组排序更快呢?

And note, I'm not married to these conclusions or anything. The sample size is small and my testing has proven to be flawed before, so my results may very well just be the result of bad testing. Also, there seem to be various factors that I have no awareness about that could be affecting the results (as Ryan O'Hara pointed out in my earlier post). The point of this post is to discover any fact-based explanation for sorting behavior in Javascript.

注意,我并没有接受这些结论。样本量很小,而且我的测试在之前已经被证明是有缺陷的,所以我的结果很可能只是糟糕测试的结果。而且,似乎有各种各样的因素,我没有意识到这可能会影响结果(正如Ryan O'Hara在我之前的文章中指出的)。本文的目的是发现任何基于事实的Javascript排序行为的解释。

Thanks for reading!

感谢你的阅读!

2 个解决方案

#1


4  

Are these conclusions backed by any documentation/something other than my testing? That is, did I reach the correct conclusions?

除了我的测试之外,这些结论是否有任何文档支持?也就是说,我得出了正确的结论吗?

The specifics of how .sort() is implemented is not required by any specification, therefore the performance aspects of .sort() are only to be discovered via performance testing interesting data sets in browsers or JS implementations of interest. Pretty much all performance questions are best answered with testing in the specific circumstances that matter to you. Generalizations outside of that can easily be misleading or wrong and do not necessarily apply to all configurations.

任何规范都不需要实现.sort()的细节,因此.sort()的性能方面只能通过浏览器或感兴趣的JS实现中的性能测试有趣的数据集来发现。几乎所有的性能问题都是最好的回答,在特定的情况下,对你来说很重要。除此之外的泛化很容易引起误解或出错,并不一定适用于所有的配置。

And why? Why do arrays of numbers sort faster than arrays of objects (makes sense intuitively, but what's the explanation behind this, if any)? Not only this, but why do arrays containing MORE data seem to sort faster than those containing less data?

,为什么?为什么数字数组排序速度比对象数组快(直觉上讲得通,但如果有的话,这背后的解释是什么)?不仅如此,为什么包含更多数据的数组比包含更少数据的数组排序更快呢?

The performance of a given sort with a custom comparison function is going to be governed by the following items:

具有自定义比较功能的给定排序的性能将由以下项目控制:

  1. The length of the array. A longer array will require more sort comparisons.
  2. 数组的长度。更长的数组需要更多的排序比较。
  3. The smarts of the internal sort algorithm to reduce the number of sort comparisons as small as possible
  4. 内部排序算法的聪明之处是尽可能减少排序比较的数量
  5. The performance of the custom sort function (how long it takes to execute a given sort comparison).
  6. 自定义排序函数的性能(执行给定排序比较所需的时间)。

So, if you hold the custom sort function and the .sort() implementation you're using constant and the data in the array constant, then a longer array will take longer to sort.

因此,如果您持有自定义排序函数和.sort()实现,您使用的是常量和数组常量中的数据,那么一个较长的数组进行排序需要花费更长的时间。

But, if you change both 1. and 3. above (one in a favorable direction and one in a less favorable direction) as you are doing when you go from sorting an array of numbers to sorting an array of objects by a specific property value, then the delta in speed will dependent upon whether the net change is positive or negative which depends upon several things which are hard to predict outside of a very specific implementation and data set and a lot of testing (in other words, it could go either way).

但是,如果你同时改变1。和3。以上(一分之一有利的方向和一分之一不利于方向)为你做,当你从排序数字数组排序的数组对象的特定属性值,然后三角洲的速度将取决于是否净变化是积极的还是消极的,取决于几件事情是难以预料之外的一个非常具体的实现和数据集和测试(换句话说,它可以)。

For some test info on sorting an array of numbers vs. sorting a property from an array of objects, see http://jsperf.com/sort-value-vs-property. To no surprise, it is slightly faster to sort the array of numbers though not by a lot.

有关对数字数组排序和从对象数组排序属性的一些测试信息,请参见http://jsperf.com/sort-value-vs属性。不足为奇的是,对数字数组进行排序的速度要稍微快一些,虽然不是很多。

#2


0  

I believe that it has to do with the way the sorting works in javascript. Numbers are converted to strings before sorting if the comparison function is not supplied, action that takes some time.

我认为这与排序在javascript中的工作方式有关。如果没有提供比较函数,则在排序之前将数字转换为字符串,这需要一些时间。

#1


4  

Are these conclusions backed by any documentation/something other than my testing? That is, did I reach the correct conclusions?

除了我的测试之外,这些结论是否有任何文档支持?也就是说,我得出了正确的结论吗?

The specifics of how .sort() is implemented is not required by any specification, therefore the performance aspects of .sort() are only to be discovered via performance testing interesting data sets in browsers or JS implementations of interest. Pretty much all performance questions are best answered with testing in the specific circumstances that matter to you. Generalizations outside of that can easily be misleading or wrong and do not necessarily apply to all configurations.

任何规范都不需要实现.sort()的细节,因此.sort()的性能方面只能通过浏览器或感兴趣的JS实现中的性能测试有趣的数据集来发现。几乎所有的性能问题都是最好的回答,在特定的情况下,对你来说很重要。除此之外的泛化很容易引起误解或出错,并不一定适用于所有的配置。

And why? Why do arrays of numbers sort faster than arrays of objects (makes sense intuitively, but what's the explanation behind this, if any)? Not only this, but why do arrays containing MORE data seem to sort faster than those containing less data?

,为什么?为什么数字数组排序速度比对象数组快(直觉上讲得通,但如果有的话,这背后的解释是什么)?不仅如此,为什么包含更多数据的数组比包含更少数据的数组排序更快呢?

The performance of a given sort with a custom comparison function is going to be governed by the following items:

具有自定义比较功能的给定排序的性能将由以下项目控制:

  1. The length of the array. A longer array will require more sort comparisons.
  2. 数组的长度。更长的数组需要更多的排序比较。
  3. The smarts of the internal sort algorithm to reduce the number of sort comparisons as small as possible
  4. 内部排序算法的聪明之处是尽可能减少排序比较的数量
  5. The performance of the custom sort function (how long it takes to execute a given sort comparison).
  6. 自定义排序函数的性能(执行给定排序比较所需的时间)。

So, if you hold the custom sort function and the .sort() implementation you're using constant and the data in the array constant, then a longer array will take longer to sort.

因此,如果您持有自定义排序函数和.sort()实现,您使用的是常量和数组常量中的数据,那么一个较长的数组进行排序需要花费更长的时间。

But, if you change both 1. and 3. above (one in a favorable direction and one in a less favorable direction) as you are doing when you go from sorting an array of numbers to sorting an array of objects by a specific property value, then the delta in speed will dependent upon whether the net change is positive or negative which depends upon several things which are hard to predict outside of a very specific implementation and data set and a lot of testing (in other words, it could go either way).

但是,如果你同时改变1。和3。以上(一分之一有利的方向和一分之一不利于方向)为你做,当你从排序数字数组排序的数组对象的特定属性值,然后三角洲的速度将取决于是否净变化是积极的还是消极的,取决于几件事情是难以预料之外的一个非常具体的实现和数据集和测试(换句话说,它可以)。

For some test info on sorting an array of numbers vs. sorting a property from an array of objects, see http://jsperf.com/sort-value-vs-property. To no surprise, it is slightly faster to sort the array of numbers though not by a lot.

有关对数字数组排序和从对象数组排序属性的一些测试信息,请参见http://jsperf.com/sort-value-vs属性。不足为奇的是,对数字数组进行排序的速度要稍微快一些,虽然不是很多。

#2


0  

I believe that it has to do with the way the sorting works in javascript. Numbers are converted to strings before sorting if the comparison function is not supplied, action that takes some time.

我认为这与排序在javascript中的工作方式有关。如果没有提供比较函数,则在排序之前将数字转换为字符串,这需要一些时间。