为什么。net中的多维数组比普通数组慢?

时间:2022-05-05 20:59:10

Edit: I apologize everybody. I used the term "jagged array" when I actually meant to say "multi-dimensional array" (as can be seen in my example below). I apologize for using the incorrect name. I actually found jagged arrays to be faster than multi-dimensional ones! I have added my measurements for jagged arrays.

编辑:我每个人道歉。当我真正想说“多维数组”时,我使用了术语“交错数组”(如下面的示例所示)。我很抱歉用错了名字。我发现交错数组比多维数组要快!我已经添加了对交错数组的度量。

I was trying to use a jagged multi-dimensional array today, when I noticed that it's performance is not as I would have expected. Using a single-dimensional array and manually calculating indices was much faster (almost two times) than using a 2D array. I wrote a test using 1024*1024 arrays (initialized to random values), for 1000 iterations, and I got the following results on my machine:

今天我尝试使用一个参差不齐的多维数组,当我注意到它的性能并不是我所期望的那样。使用单维数组和手工计算索引要比使用2D数组快得多(几乎是两倍)。我使用1024*1024数组(初始化为随机值)编写了一个测试,进行了1000次迭代,结果如下:

sum(double[], int): 2738 ms (100%)
sum(double[,]):     5019 ms (183%)
sum(double[][]):    2540 ms ( 93%)

This is my test code:

这是我的测试代码:

public static double sum(double[] d, int l1) {
    // assuming the array is rectangular
    double sum = 0;
    int l2 = d.Length / l1;
    for (int i = 0; i < l1; ++i)
        for (int j = 0; j < l2; ++j)
            sum += d[i * l2 + j];
    return sum;
}

public static double sum(double[,] d) {
    double sum = 0;
    int l1 = d.GetLength(0);
    int l2 = d.GetLength(1);
    for (int i = 0; i < l1; ++i)
        for (int j = 0; j < l2; ++j)
            sum += d[i, j];
    return sum;
}

public static double sum(double[][] d) {
    double sum = 0;
    for (int i = 0; i < d.Length; ++i)
        for (int j = 0; j < d[i].Length; ++j)
            sum += d[i][j];
    return sum;
}

public static void Main() {
    Random random = new Random();
    const int l1  = 1024, l2 = 1024;
    double[ ] d1  = new double[l1 * l2];
    double[,] d2  = new double[l1 , l2];
    double[][] d3 = new double[l1][];

    for (int i = 0; i < l1; ++i) {
        d3[i] = new double[l2];
        for (int j = 0; j < l2; ++j)
            d3[i][j] = d2[i, j] = d1[i * l2 + j] = random.NextDouble();
    }
    //
    const int iterations = 1000;
    TestTime(sum, d1, l1, iterations);
    TestTime(sum, d2, iterations);
    TestTime(sum, d3, iterations);
}

Further investigation showed that the IL for the second method is 23% larger than that of the first method. (Code size 68 vs 52.) This is mostly due to calls to System.Array::GetLength(int). The compiler also emits calls to Array::Get for the jagged multi-dimensional array, whereas it simply calls ldelem for the simple array.

进一步研究表明,第二种方法的IL值比第一种方法高23%。(代码大小68对52)这主要是由于调用System.Array: GetLength(int)。编译器也会发出对数组的调用::Get表示交错的多维数组,而它只对简单数组调用ldelem。

So I am wondering, why is access through multi-dimensional arrays slower than normal arrays? I would have assumed the compiler (or JIT) would do something similar to what I did in my first method, but this was not actually the case.

所以我想知道,为什么通过多维数组访问要比普通数组慢?我本以为编译器(或JIT)会做一些类似于我在第一个方法中所做的事情,但实际情况并非如此。

Could you plese help me understand why this is happening the way it is?

你能帮我理解一下为什么会这样吗?


Update: Following Henk Holterman's suggestion, here is the implementation of TestTime:

更新:根据Henk Holterman的建议,以下是测试时间的实现:

public static void TestTime<T, TR>(Func<T, TR> action, T obj,
                                   int iterations)
{
    Stopwatch stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < iterations; ++i)
        action(obj);
    Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
}

public static void TestTime<T1, T2, TR>(Func<T1, T2, TR> action, T1 obj1,
                                        T2 obj2, int iterations)
{
    Stopwatch stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < iterations; ++i)
        action(obj1, obj2);
    Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
}

9 个解决方案

#1


40  

Single dimensional arrays with a lower bound of 0 are a different type to either multi-dimensional or non-0 lower bound arrays within IL (vector vs array IIRC). vector is simpler to work with - to get to element x, you just do pointer + size * x. For an array, you have to do pointer + size * (x-lower bound) for a single dimensional array, and yet more arithmetic for each dimension you add.

单维数组的下界为0,是不同于在IL(向量vs数组IIRC)中的多维或非0下界数组的一种类型。向量是更简单的工作——为了得到元素x,你只需要做指针+ size * x。对于一个数组,你必须要做一个单一维度数组的指针+大小* (x-下界),而且你要添加的每个维度都要做更多的算术运算。

Basically the CLR is optimised for the vastly more common case.

基本上,CLR是针对更常见的情况进行优化的。

#2


8  

Array bounds checking?

数组边界检查吗?

The single-dimension array has a length member that you access directly - when compiled this is just a memory read.

单维数组有一个长度成员,您可以直接访问它——当编译时,这只是一个内存读取。

The multidimensional array requires a GetLength(int dimension) method call that processes the argument to get the relevant length for that dimension. That doesn't compile down to a memory read, so you get a method call, etc.

多维数组需要一个GetLength(int维)方法调用,该方法处理参数以获得该维的相关长度。它不会编译为一个内存读取,所以你会得到一个方法调用,等等。

In addition that GetLength(int dimension) will do a bounds check on the parameter.

此外,GetLength(int维)将对参数进行边界检查。

#3


4  

Interestly, I ran the following code from above using VS2008 NET3.5SP1 Win32 on a Vista box, and in release/optimize the difference was barely measurable, while debug/noopt the multi-dim arrays were much slower. (I ran the three tests twice to reduce JIT affects on the second set.)

有趣的是,我在Vista上使用VS2008 NET3.5SP1 Win32运行了上面的代码,在发布/优化中,差异几乎无法衡量,而调试/noopt的multidim数组要慢得多。(我对这三个测试进行了两次测试,以减少第二组的JIT影响。)

  Here are my numbers: 
    sum took 00:00:04.3356535
    sum took 00:00:04.1957663
    sum took 00:00:04.5523050
    sum took 00:00:04.0183060
    sum took 00:00:04.1785843 
    sum took 00:00:04.4933085

Look at the second set of three numbers. The difference is not enough for me to code everything in single dimension arrays.

看看第二组三个数字。对我来说,用单维数组来编码是不够的。

Although I haven't posted them, in Debug/unoptimized the multidimension vs. single/jagged does make a huge difference.

尽管我没有发布它们,但是在Debug/ un优化的多维与单/参差之间确实产生了巨大的差异。

Full program:

完整的计划:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;

namespace single_dimension_vs_multidimension
{
    class Program
    {


        public static double sum(double[] d, int l1) {    // assuming the array is rectangular 
            double sum = 0; 
            int l2 = d.Length / l1; 
            for (int i = 0; i < l1; ++i)   
                for (int j = 0; j < l2; ++j)   
                    sum += d[i * l2 + j];   
            return sum;
        }

        public static double sum(double[,] d)
        {
            double sum = 0;  
            int l1 = d.GetLength(0);
            int l2 = d.GetLength(1);   
            for (int i = 0; i < l1; ++i)    
                for (int j = 0; j < l2; ++j)   
                    sum += d[i, j]; 
            return sum;
        }
        public static double sum(double[][] d)
        {
            double sum = 0;   
            for (int i = 0; i < d.Length; ++i) 
                for (int j = 0; j < d[i].Length; ++j) 
                    sum += d[i][j];
            return sum;
        }
        public static void TestTime<T, TR>(Func<T, TR> action, T obj, int iterations) 
        { 
            Stopwatch stopwatch = Stopwatch.StartNew();
            for (int i = 0; i < iterations; ++i)      
                action(obj);
            Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
        }
        public static void TestTime<T1, T2, TR>(Func<T1, T2, TR> action, T1 obj1, T2 obj2, int iterations)
        {
            Stopwatch stopwatch = Stopwatch.StartNew(); 
            for (int i = 0; i < iterations; ++i)    
                action(obj1, obj2); 
            Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
        }
        public static void Main() {   
            Random random = new Random(); 
            const int l1  = 1024, l2 = 1024; 
            double[ ] d1  = new double[l1 * l2]; 
            double[,] d2  = new double[l1 , l2];  
            double[][] d3 = new double[l1][];   
            for (int i = 0; i < l1; ++i)
            {
                d3[i] = new double[l2];   
                for (int j = 0; j < l2; ++j)  
                    d3[i][j] = d2[i, j] = d1[i * l2 + j] = random.NextDouble();
            }    
            const int iterations = 1000;
            TestTime<double[], int, double>(sum, d1, l1, iterations);
            TestTime<double[,], double>(sum, d2, iterations);

            TestTime<double[][], double>(sum, d3, iterations);
            TestTime<double[], int, double>(sum, d1, l1, iterations);
            TestTime<double[,], double>(sum, d2, iterations);
            TestTime<double[][], double>(sum, d3, iterations); 
        }

    }
}

#4


3  

Because a multidimensional array is just a syntactic sugar as it is really just a flat array with some index calculation magic. On the other hand, a jagged array is like, an array of arrays. With a two-dimensional array, accessing an element requires reading the memory just once, while with a two level jagged array, you need to read the memory twice.

因为多维数组只是一种语法糖,因为它实际上只是一个平面数组,具有某种索引计算的魔力。另一方面,锯齿状数组就像数组一样。对于一个二维数组,访问一个元素只需要读取一次内存,而使用两个级别的交错数组,则需要读取两次内存。

EDIT: Apparently the original poster mixed up "jagged arrays" with "multi-dimensional arrays" so my reasoning doesn't exactly stand. For the real reason, check Jon Skeet's heavy artillery answer above.

编辑:很显然,最初的海报将“锯齿状数组”与“多维数组”混合在一起,所以我的推理并不准确。真正的原因是,看看乔恩·斯凯的重炮回答。

#5


2  

Jagged arrays are arrays of class references (other arrays) up until the leaf array which may be an array of a primitive type. Hence memory allocated for each of the other arrays can be all over the place.

交错数组是类引用(其他数组)的数组,直到叶数组(可能是原始类型的数组)。因此,分配给每个其他数组的内存可以遍历整个位置。

Whereas a mutli-dimensional array has its memory allocated in one contigeous lump.

而多维数组的内存分配在一个连续块中。

#6


1  

I think it has got something to do for the fact that jagged arrays are actually arrays of arrays hence there are two levels of indirection to get to the actual data.

我认为它对锯齿状数组实际上是数组这一事实有所帮助,因此有两个层次的间接数据来获取实际数据。

#7


1  

I'm with everyone else here

我和其他人在一起

I had a program with three dimension array, let me tell you that when I moved the array into two dimension, I saw a huge boost and then I moved to a one dimension array.

我有一个带有三维数组的程序,我告诉你当我把数组移到二维时,我看到了一个巨大的提升然后我移到了一维数组。

In the end, I think I saw over 500% performance boost in the execution time.

最后,我认为我在执行时间中看到了超过500%的性能提升。

only drawback was the complexity added to find out where was what in the one dimensional array, versus the three one.

唯一的缺点是增加了复杂性,找出在一维数组中是什么,而不是三个。

#8


1  

I think multi-dimensional is slower, the runtime has to check two or more(three dimensional and up) bounds check.

我认为多维是比较慢的,运行时必须检查两个或更多(三维和向上)边界检查。

#9


-1  

Bounds checking. Your "j" variable could exceed l2 provided "i" was less than l1. This would not be legal in the second example

边界检查。如果“i”小于l1,则“j”变量可以超过l2。在第二个例子中,这是不合法的

#1


40  

Single dimensional arrays with a lower bound of 0 are a different type to either multi-dimensional or non-0 lower bound arrays within IL (vector vs array IIRC). vector is simpler to work with - to get to element x, you just do pointer + size * x. For an array, you have to do pointer + size * (x-lower bound) for a single dimensional array, and yet more arithmetic for each dimension you add.

单维数组的下界为0,是不同于在IL(向量vs数组IIRC)中的多维或非0下界数组的一种类型。向量是更简单的工作——为了得到元素x,你只需要做指针+ size * x。对于一个数组,你必须要做一个单一维度数组的指针+大小* (x-下界),而且你要添加的每个维度都要做更多的算术运算。

Basically the CLR is optimised for the vastly more common case.

基本上,CLR是针对更常见的情况进行优化的。

#2


8  

Array bounds checking?

数组边界检查吗?

The single-dimension array has a length member that you access directly - when compiled this is just a memory read.

单维数组有一个长度成员,您可以直接访问它——当编译时,这只是一个内存读取。

The multidimensional array requires a GetLength(int dimension) method call that processes the argument to get the relevant length for that dimension. That doesn't compile down to a memory read, so you get a method call, etc.

多维数组需要一个GetLength(int维)方法调用,该方法处理参数以获得该维的相关长度。它不会编译为一个内存读取,所以你会得到一个方法调用,等等。

In addition that GetLength(int dimension) will do a bounds check on the parameter.

此外,GetLength(int维)将对参数进行边界检查。

#3


4  

Interestly, I ran the following code from above using VS2008 NET3.5SP1 Win32 on a Vista box, and in release/optimize the difference was barely measurable, while debug/noopt the multi-dim arrays were much slower. (I ran the three tests twice to reduce JIT affects on the second set.)

有趣的是,我在Vista上使用VS2008 NET3.5SP1 Win32运行了上面的代码,在发布/优化中,差异几乎无法衡量,而调试/noopt的multidim数组要慢得多。(我对这三个测试进行了两次测试,以减少第二组的JIT影响。)

  Here are my numbers: 
    sum took 00:00:04.3356535
    sum took 00:00:04.1957663
    sum took 00:00:04.5523050
    sum took 00:00:04.0183060
    sum took 00:00:04.1785843 
    sum took 00:00:04.4933085

Look at the second set of three numbers. The difference is not enough for me to code everything in single dimension arrays.

看看第二组三个数字。对我来说,用单维数组来编码是不够的。

Although I haven't posted them, in Debug/unoptimized the multidimension vs. single/jagged does make a huge difference.

尽管我没有发布它们,但是在Debug/ un优化的多维与单/参差之间确实产生了巨大的差异。

Full program:

完整的计划:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;

namespace single_dimension_vs_multidimension
{
    class Program
    {


        public static double sum(double[] d, int l1) {    // assuming the array is rectangular 
            double sum = 0; 
            int l2 = d.Length / l1; 
            for (int i = 0; i < l1; ++i)   
                for (int j = 0; j < l2; ++j)   
                    sum += d[i * l2 + j];   
            return sum;
        }

        public static double sum(double[,] d)
        {
            double sum = 0;  
            int l1 = d.GetLength(0);
            int l2 = d.GetLength(1);   
            for (int i = 0; i < l1; ++i)    
                for (int j = 0; j < l2; ++j)   
                    sum += d[i, j]; 
            return sum;
        }
        public static double sum(double[][] d)
        {
            double sum = 0;   
            for (int i = 0; i < d.Length; ++i) 
                for (int j = 0; j < d[i].Length; ++j) 
                    sum += d[i][j];
            return sum;
        }
        public static void TestTime<T, TR>(Func<T, TR> action, T obj, int iterations) 
        { 
            Stopwatch stopwatch = Stopwatch.StartNew();
            for (int i = 0; i < iterations; ++i)      
                action(obj);
            Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
        }
        public static void TestTime<T1, T2, TR>(Func<T1, T2, TR> action, T1 obj1, T2 obj2, int iterations)
        {
            Stopwatch stopwatch = Stopwatch.StartNew(); 
            for (int i = 0; i < iterations; ++i)    
                action(obj1, obj2); 
            Console.WriteLine(action.Method.Name + " took " + stopwatch.Elapsed);
        }
        public static void Main() {   
            Random random = new Random(); 
            const int l1  = 1024, l2 = 1024; 
            double[ ] d1  = new double[l1 * l2]; 
            double[,] d2  = new double[l1 , l2];  
            double[][] d3 = new double[l1][];   
            for (int i = 0; i < l1; ++i)
            {
                d3[i] = new double[l2];   
                for (int j = 0; j < l2; ++j)  
                    d3[i][j] = d2[i, j] = d1[i * l2 + j] = random.NextDouble();
            }    
            const int iterations = 1000;
            TestTime<double[], int, double>(sum, d1, l1, iterations);
            TestTime<double[,], double>(sum, d2, iterations);

            TestTime<double[][], double>(sum, d3, iterations);
            TestTime<double[], int, double>(sum, d1, l1, iterations);
            TestTime<double[,], double>(sum, d2, iterations);
            TestTime<double[][], double>(sum, d3, iterations); 
        }

    }
}

#4


3  

Because a multidimensional array is just a syntactic sugar as it is really just a flat array with some index calculation magic. On the other hand, a jagged array is like, an array of arrays. With a two-dimensional array, accessing an element requires reading the memory just once, while with a two level jagged array, you need to read the memory twice.

因为多维数组只是一种语法糖,因为它实际上只是一个平面数组,具有某种索引计算的魔力。另一方面,锯齿状数组就像数组一样。对于一个二维数组,访问一个元素只需要读取一次内存,而使用两个级别的交错数组,则需要读取两次内存。

EDIT: Apparently the original poster mixed up "jagged arrays" with "multi-dimensional arrays" so my reasoning doesn't exactly stand. For the real reason, check Jon Skeet's heavy artillery answer above.

编辑:很显然,最初的海报将“锯齿状数组”与“多维数组”混合在一起,所以我的推理并不准确。真正的原因是,看看乔恩·斯凯的重炮回答。

#5


2  

Jagged arrays are arrays of class references (other arrays) up until the leaf array which may be an array of a primitive type. Hence memory allocated for each of the other arrays can be all over the place.

交错数组是类引用(其他数组)的数组,直到叶数组(可能是原始类型的数组)。因此,分配给每个其他数组的内存可以遍历整个位置。

Whereas a mutli-dimensional array has its memory allocated in one contigeous lump.

而多维数组的内存分配在一个连续块中。

#6


1  

I think it has got something to do for the fact that jagged arrays are actually arrays of arrays hence there are two levels of indirection to get to the actual data.

我认为它对锯齿状数组实际上是数组这一事实有所帮助,因此有两个层次的间接数据来获取实际数据。

#7


1  

I'm with everyone else here

我和其他人在一起

I had a program with three dimension array, let me tell you that when I moved the array into two dimension, I saw a huge boost and then I moved to a one dimension array.

我有一个带有三维数组的程序,我告诉你当我把数组移到二维时,我看到了一个巨大的提升然后我移到了一维数组。

In the end, I think I saw over 500% performance boost in the execution time.

最后,我认为我在执行时间中看到了超过500%的性能提升。

only drawback was the complexity added to find out where was what in the one dimensional array, versus the three one.

唯一的缺点是增加了复杂性,找出在一维数组中是什么,而不是三个。

#8


1  

I think multi-dimensional is slower, the runtime has to check two or more(three dimensional and up) bounds check.

我认为多维是比较慢的,运行时必须检查两个或更多(三维和向上)边界检查。

#9


-1  

Bounds checking. Your "j" variable could exceed l2 provided "i" was less than l1. This would not be legal in the second example

边界检查。如果“i”小于l1,则“j”变量可以超过l2。在第二个例子中,这是不合法的