使用LINQ“无处不在”时的性能问题?

时间:2022-09-17 21:52:04

After upgrading to ReSharper5 it gives me even more useful tips on code improvements. One I see everywhere now is a tip to replace foreach-statements with LINQ queries. Take this example:

升级到ReSharper5后,它为我提供了更多有关代码改进的有用提示。我现在到处看到的一个是用LINQ查询替换foreach语句的提示。举个例子:

private Ninja FindNinjaById(int ninjaId)
{
    foreach (var ninja in Ninjas)
    {
        if (ninja.Id == ninjaId)
            return ninja;
    }
    return null;
}

This is suggested replaced with the following using LINQ:

建议使用LINQ替换以下内容:

private Ninja FindNinjaById(int ninjaId)
{
    return Ninjas.FirstOrDefault(ninja => ninja.Id == ninjaId);
}

This looks all fine, and I'm sure it's no problem regarding performance to replace this one foreach. But is it something I should do in general? Or might I run into performance problems with all these LINQ queries everywhere?

这看起来很好,我敢肯定,替换这个foreach的性能没有问题。但这是我应该做的一般事情吗?或者我可能遇到所有这些LINQ查询的性能问题?

9 个解决方案

#1


18  

You need to understand what the LINQ query is going to do "under the hood" and compare that to running your code before you can know whether you should change it. Generally, I don't mean that you need to know the exact code that will be generated, but you do need to know the basic idea of how it would go about performing the operation. In your example, I would surmise that LINQ would basically work about the same as your code and because the LINQ statement is more compact and descriptive, I would prefer it. There are times, though, when LINQ may not be the ideal choice, though probably not many. Generally I would think that just about any looping construct would be replaceable by an equivalent LINQ construct.

您需要了解LINQ查询将在“引擎盖”下执行的操作,并将其与运行代码进行比较,然后才能知道是否应该更改它。通常,我并不是说你需要知道将要生成的确切代码,但是你需要知道它将如何执行操作的基本思路。在你的例子中,我猜测LINQ基本上和你的代码一样工作,因为LINQ语句更紧凑和描述性,我更喜欢它。但有时候,LINQ可能不是理想的选择,尽管可能并不多。一般来说,我认为几乎任何循环结构都可以被等效的LINQ结构替换。

#2


15  

Let me start by saying that I love LINQ for its expressiveness and use it all the time without any problem.

首先让我说我喜欢LINQ的表现力,并且一直使用它没有任何问题。

There are however some differences in performance. Normally they are small enough to ignore, but in the critical path of your application, there might be times you want to optimize them away.

但是在性能方面存在一些差异。通常它们足够小,可以忽略,但在应用程序的关键路径中,有时可能需要优化它们。

Here is the set of differences that you should be aware of, that could matter with performance:

以下是您应该注意的一组差异,这可能与性能有关:

  • LINQ uses delegate calls excessively, and delegate invocations are (a very tiny bit) slower than method invocations and of course slower than inline code.
  • LINQ过度使用委托调用,并且委托调用比方法调用(非常小一点)慢,当然比内联代码慢。
  • A delegate is a method pointer inside an object. That object need to be created.
  • 委托是对象内的方法指针。需要创建该对象。
  • LINQ operators usually return a new object (an iterator) that allows looping through the collection. Chained LINQ operators thus create multiple new objects.
  • LINQ运算符通常返回一个允许循环遍历集合的新对象(迭代器)。链式LINQ运算符因此创建多个新对象。
  • When your inner loop uses objects from outside (called closures) they have to be wrapped in objects as well (which need to be created).
  • 当你的内部循环使用来自外部的对象(称为闭包)时,它们也必须被包装在对象中(需要创建它们)。
  • Many LINQ operators call the GetEnumerator method on an collection to iterate it. Calling GetEnumerator usually ensures the creation of yet another object.
  • 许多LINQ运算符调用集合上的GetEnumerator方法来迭代它。调用GetEnumerator通常可以确保创建另一个对象。
  • Iterating the collection is done using the IEnumerator interface. Interface calls are a bit slower than normal method calls.
  • 使用IEnumerator接口迭代集合。接口调用比普通方法调用慢一点。
  • IEnumerator objects often need to be disposed or at least, Dispose has to be called.
  • IEnumerator对象经常需要处理,或者至少必须调用Dispose。

When performance is a concern, also try using for over foreach.

当性能受到关注时,也可以尝试使用foreach。

Again, I love LINQ and I can't remember ever decided not to use a LINQ (to objects) query because of performance. So, don't do any premature optimizations. Start with the most readability solution first, than optimize when needed. So profile, profile and profile.

再一次,我喜欢LINQ,我不记得因为性能决定不使用LINQ(对象)查询。所以,不要做任何过早的优化。首先从最易读的解决方案开始,然后在需要时进行优化。所以简介,个人资料和个人资料

#3


6  

Profile


The only way to know for sure is to profile. Yes, certain queries can be slower. But when you look at what ReSharper has replaced here, it's essentially the same thing, done in a different manner. The ninjas are looped, each Id is checked. If anything, you could argue this refactoring comes down to readability. Which of the two do you find easier to read?

确切知道的唯一方法是剖析。是的,某些查询可能会更慢。但是当你看看ReSharper在这里取代了什么时,它本质上是一样的,以不同的方式完成。 ninjas循环,每个Id都被检查。如果有的话,你可以说这个重构归结为可读性。您觉得哪两个更容易阅读?

Larger data sets will have a bigger impact sure, but as I've said, profile. It's the only way to be sure if such enhancements have a negative effect.

更大的数据集肯定会产生更大的影响,但正如我所说的,简介。这是确定此类增强是否会产生负面影响的唯一方法。

#4


6  

One thing we identified to be performance problematic is creating lots of lambdas and iterating over small collections. What happens in the converted sample?

我们发现性能问题的一件事是创建大量的lambdas并迭代小集合。转换后的样本会发生什么?

Ninjas.FirstOrDefault(ninja => ninja.Id == ninjaId)

First, new instance of (generated) closure type is created. New instance in managed heap, some work for GC. Second, new delegate instance is created from method in that closure. Then method FirstOrDefault is called. What it does? It iterates collection (same as your original code) and calls delegate.

首先,创建(生成的)闭包类型的新实例。托管堆中的新实例,有些适用于GC。其次,从该闭包中的方法创建新的委托实例。然后调用方法FirstOrDefault。它能做什么?它迭代集合(与原始代码相同)并调用委托。

So basically, you have 4 things added here: 1. Create closure 2. Create delegate 3. Call through delegate 4. Collect closure and delegate

所以基本上,你在这里添加了4件事:1。创建闭包2.创建委托3.通过委托调用4.收集闭包和委托

If you call FindNinjaById lots of times, you will add this to may be important perforamnce hit. Of course, measure it.

如果你多次调用FindNinjaById,你会添加这个可能是重要的性能打击。当然,衡量它。

If you replace it with (equivalent)

如果用(等效)替换它

Ninjas.Where(ninja => ninja.Id == ninjaId).FirstOrDefault()

it adds 5. Creating state machine for iterator ("Where" is yielding function)

它添加5.为迭代器创建状态机(“Where”正在产生函数)

#5


5  

We've built massive apps, with LINQ sprinkled liberally throughout. It's never, ever slowed us down.

我们已经构建了大量的应用程序,LINQ遍布各处。永远不会让我们放慢脚步。

It's perfectly possible to write LINQ queries that will be very slow, but it's easier to fix simple LINQ statements than enormous for/if/for/return algorithms.

编写非常慢的LINQ查询是完全可能的,但是修复简单的LINQ语句要比/ if / for / return算法更容易。

Take resharper's advice :)

以resharper的建议:)

#6


4  

An anecdote: when I was just getting to know C# 3.0 and LINQ, I was still in my "when you have a hammer, everything looks like a nail" phase. As a school assignment, I was supposed to write a connect four/four in row game as an exercise in adversarial search algorithms. I used LINQ throughout the program. In one particular case, I needed to find the row a game-piece would land on if I dropped it in a particular column. Perfect use-case for a LINQ query! This turned out to be really slow. However, LINQ wasn't the problem, the problem was that I was searching to begin with. I optimized this by just keeping a look-up table: an integer array containing the row number for every column of the game-board, updating that table when inserting a game-piece. Needless to say, this was much, much faster.

一则轶事:当我刚刚了解C#3.0和LINQ时,我仍然在“当你有一把锤子,一切看起来像一个钉子”阶段。作为一项学校作业,我应该写一个连接四/四连接游戏作为对抗搜索算法的练习。我在整个程序中使用了LINQ。在一个特定的情况下,如果我将它放在特定的列中,我需要找到游戏块所在的行。 LINQ查询的完美用例!事实证明这很慢。但是,LINQ不是问题,问题是我开始搜索。我通过保存一个查找表来优化它:一个整数数组,包含游戏板每列的行号,在插入游戏时更新该表。不用说,这要快得多。

Lesson learned: optimize your algorithm first, and high level constructs like LINQ might actually make that easier.

获得的经验教训:首先优化算法,LINQ等高级构造实际上可以使这更容易。

That said, there is a definite cost to creating all those delegates. On the other hand, there can also be a performance benefit by utilizing LINQ's lazy nature. If you manually loop over a collection, you're pretty much forced to create intermediate List<>'s whereas with LINQ, you basically stream the results.

也就是说,创建所有这些代表都有一定的成本。另一方面,利用LINQ的懒惰性质也可以带来性能优势。如果你手动循环一个集合,你几乎*创建中间List <>,而使用LINQ,你基本上流式传输结果。

#7


3  

The above does the exact same thing.

以上内容完全相同。

As long as you use your LINQ queries correctly you will not suffer from performance issues. If you use it correctly it is more likely to be faster due to the skill of the people creating LINQ.

只要您正确使用LINQ查询,就不会遇到性能问题。如果您正确使用它,由于创建LINQ的人员的技能,它更可能更快。

The only thing you can benefit of creating your own is if you want full control or LINQ does not offer what you need or you want a better ability to debug.

您可以创建自己的唯一好处是,如果您想要完全控制或LINQ不提供您所需要的,或者您希望更好的调试能力。

#8


3  

The cool thing about LINQ queries is that it makes it dead simple to convert to a parallel query. Depending on what you're doing, it may or may not be faster (as always, profile), but it's pretty neat, nonetheless.

关于LINQ查询的一个很酷的事情是它使转换为并行查询变得简单。取决于你正在做什么,它可能会或可能不会更快(一如既往,简介),但它仍然非常整洁。

#9


3  

To add my own experience of using LINQ where performance really does matter - with Monotouch - the difference there is still insignificant.

为了增加我自己使用LINQ的经验,其中性能确实很重要 - 使用Monotouch - 差异仍然微不足道。

You're 'handicapped' on the 3GS iPhone to around 46mb of ram and a 620mhz ARM processor. Admittedly the code is AOT compiled but even on the simulator where it is JIT'd and going through a long series of indirection the difference is tenths of a millisecond for sets of 1000s of objects.

你在3GS iPhone上“残疾”到大约46mb的RAM和620mhz的ARM处理器。不可否认,代码是AOT编译的,但即使在模拟器上它是JIT并且经历了一系列间接,对于1000个对象的集合,差异是十分之一毫秒。

Along with Windows Mobile this is where you have to worry about the performance costs - not in huge ASP.NET applications that are running on quad-core 8gb servers, or desktops with dual scores. One exception to this would be with large object sets, although arguably you would lazy load anyway, and the initial query task would be performed on the database server.

与Windows Mobile一起,您需要担心性能成本 - 而不是在四核8gb服务器上运行的大型ASP.NET应用程序或具有双重分数的桌面。对此的一个例外是大型对象集,尽管可以说你无论如何都会延迟加载,并且初始查询任务将在数据库服务器上执行。

It's a bit of a cliché on *, but use the shorter more readable code until 100s of milliseconds really do matter.

这在*上有点陈词滥调,但使用更短的更易读的代码,直到100毫秒确实很重要。

#1


18  

You need to understand what the LINQ query is going to do "under the hood" and compare that to running your code before you can know whether you should change it. Generally, I don't mean that you need to know the exact code that will be generated, but you do need to know the basic idea of how it would go about performing the operation. In your example, I would surmise that LINQ would basically work about the same as your code and because the LINQ statement is more compact and descriptive, I would prefer it. There are times, though, when LINQ may not be the ideal choice, though probably not many. Generally I would think that just about any looping construct would be replaceable by an equivalent LINQ construct.

您需要了解LINQ查询将在“引擎盖”下执行的操作,并将其与运行代码进行比较,然后才能知道是否应该更改它。通常,我并不是说你需要知道将要生成的确切代码,但是你需要知道它将如何执行操作的基本思路。在你的例子中,我猜测LINQ基本上和你的代码一样工作,因为LINQ语句更紧凑和描述性,我更喜欢它。但有时候,LINQ可能不是理想的选择,尽管可能并不多。一般来说,我认为几乎任何循环结构都可以被等效的LINQ结构替换。

#2


15  

Let me start by saying that I love LINQ for its expressiveness and use it all the time without any problem.

首先让我说我喜欢LINQ的表现力,并且一直使用它没有任何问题。

There are however some differences in performance. Normally they are small enough to ignore, but in the critical path of your application, there might be times you want to optimize them away.

但是在性能方面存在一些差异。通常它们足够小,可以忽略,但在应用程序的关键路径中,有时可能需要优化它们。

Here is the set of differences that you should be aware of, that could matter with performance:

以下是您应该注意的一组差异,这可能与性能有关:

  • LINQ uses delegate calls excessively, and delegate invocations are (a very tiny bit) slower than method invocations and of course slower than inline code.
  • LINQ过度使用委托调用,并且委托调用比方法调用(非常小一点)慢,当然比内联代码慢。
  • A delegate is a method pointer inside an object. That object need to be created.
  • 委托是对象内的方法指针。需要创建该对象。
  • LINQ operators usually return a new object (an iterator) that allows looping through the collection. Chained LINQ operators thus create multiple new objects.
  • LINQ运算符通常返回一个允许循环遍历集合的新对象(迭代器)。链式LINQ运算符因此创建多个新对象。
  • When your inner loop uses objects from outside (called closures) they have to be wrapped in objects as well (which need to be created).
  • 当你的内部循环使用来自外部的对象(称为闭包)时,它们也必须被包装在对象中(需要创建它们)。
  • Many LINQ operators call the GetEnumerator method on an collection to iterate it. Calling GetEnumerator usually ensures the creation of yet another object.
  • 许多LINQ运算符调用集合上的GetEnumerator方法来迭代它。调用GetEnumerator通常可以确保创建另一个对象。
  • Iterating the collection is done using the IEnumerator interface. Interface calls are a bit slower than normal method calls.
  • 使用IEnumerator接口迭代集合。接口调用比普通方法调用慢一点。
  • IEnumerator objects often need to be disposed or at least, Dispose has to be called.
  • IEnumerator对象经常需要处理,或者至少必须调用Dispose。

When performance is a concern, also try using for over foreach.

当性能受到关注时,也可以尝试使用foreach。

Again, I love LINQ and I can't remember ever decided not to use a LINQ (to objects) query because of performance. So, don't do any premature optimizations. Start with the most readability solution first, than optimize when needed. So profile, profile and profile.

再一次,我喜欢LINQ,我不记得因为性能决定不使用LINQ(对象)查询。所以,不要做任何过早的优化。首先从最易读的解决方案开始,然后在需要时进行优化。所以简介,个人资料和个人资料

#3


6  

Profile


The only way to know for sure is to profile. Yes, certain queries can be slower. But when you look at what ReSharper has replaced here, it's essentially the same thing, done in a different manner. The ninjas are looped, each Id is checked. If anything, you could argue this refactoring comes down to readability. Which of the two do you find easier to read?

确切知道的唯一方法是剖析。是的,某些查询可能会更慢。但是当你看看ReSharper在这里取代了什么时,它本质上是一样的,以不同的方式完成。 ninjas循环,每个Id都被检查。如果有的话,你可以说这个重构归结为可读性。您觉得哪两个更容易阅读?

Larger data sets will have a bigger impact sure, but as I've said, profile. It's the only way to be sure if such enhancements have a negative effect.

更大的数据集肯定会产生更大的影响,但正如我所说的,简介。这是确定此类增强是否会产生负面影响的唯一方法。

#4


6  

One thing we identified to be performance problematic is creating lots of lambdas and iterating over small collections. What happens in the converted sample?

我们发现性能问题的一件事是创建大量的lambdas并迭代小集合。转换后的样本会发生什么?

Ninjas.FirstOrDefault(ninja => ninja.Id == ninjaId)

First, new instance of (generated) closure type is created. New instance in managed heap, some work for GC. Second, new delegate instance is created from method in that closure. Then method FirstOrDefault is called. What it does? It iterates collection (same as your original code) and calls delegate.

首先,创建(生成的)闭包类型的新实例。托管堆中的新实例,有些适用于GC。其次,从该闭包中的方法创建新的委托实例。然后调用方法FirstOrDefault。它能做什么?它迭代集合(与原始代码相同)并调用委托。

So basically, you have 4 things added here: 1. Create closure 2. Create delegate 3. Call through delegate 4. Collect closure and delegate

所以基本上,你在这里添加了4件事:1。创建闭包2.创建委托3.通过委托调用4.收集闭包和委托

If you call FindNinjaById lots of times, you will add this to may be important perforamnce hit. Of course, measure it.

如果你多次调用FindNinjaById,你会添加这个可能是重要的性能打击。当然,衡量它。

If you replace it with (equivalent)

如果用(等效)替换它

Ninjas.Where(ninja => ninja.Id == ninjaId).FirstOrDefault()

it adds 5. Creating state machine for iterator ("Where" is yielding function)

它添加5.为迭代器创建状态机(“Where”正在产生函数)

#5


5  

We've built massive apps, with LINQ sprinkled liberally throughout. It's never, ever slowed us down.

我们已经构建了大量的应用程序,LINQ遍布各处。永远不会让我们放慢脚步。

It's perfectly possible to write LINQ queries that will be very slow, but it's easier to fix simple LINQ statements than enormous for/if/for/return algorithms.

编写非常慢的LINQ查询是完全可能的,但是修复简单的LINQ语句要比/ if / for / return算法更容易。

Take resharper's advice :)

以resharper的建议:)

#6


4  

An anecdote: when I was just getting to know C# 3.0 and LINQ, I was still in my "when you have a hammer, everything looks like a nail" phase. As a school assignment, I was supposed to write a connect four/four in row game as an exercise in adversarial search algorithms. I used LINQ throughout the program. In one particular case, I needed to find the row a game-piece would land on if I dropped it in a particular column. Perfect use-case for a LINQ query! This turned out to be really slow. However, LINQ wasn't the problem, the problem was that I was searching to begin with. I optimized this by just keeping a look-up table: an integer array containing the row number for every column of the game-board, updating that table when inserting a game-piece. Needless to say, this was much, much faster.

一则轶事:当我刚刚了解C#3.0和LINQ时,我仍然在“当你有一把锤子,一切看起来像一个钉子”阶段。作为一项学校作业,我应该写一个连接四/四连接游戏作为对抗搜索算法的练习。我在整个程序中使用了LINQ。在一个特定的情况下,如果我将它放在特定的列中,我需要找到游戏块所在的行。 LINQ查询的完美用例!事实证明这很慢。但是,LINQ不是问题,问题是我开始搜索。我通过保存一个查找表来优化它:一个整数数组,包含游戏板每列的行号,在插入游戏时更新该表。不用说,这要快得多。

Lesson learned: optimize your algorithm first, and high level constructs like LINQ might actually make that easier.

获得的经验教训:首先优化算法,LINQ等高级构造实际上可以使这更容易。

That said, there is a definite cost to creating all those delegates. On the other hand, there can also be a performance benefit by utilizing LINQ's lazy nature. If you manually loop over a collection, you're pretty much forced to create intermediate List<>'s whereas with LINQ, you basically stream the results.

也就是说,创建所有这些代表都有一定的成本。另一方面,利用LINQ的懒惰性质也可以带来性能优势。如果你手动循环一个集合,你几乎*创建中间List <>,而使用LINQ,你基本上流式传输结果。

#7


3  

The above does the exact same thing.

以上内容完全相同。

As long as you use your LINQ queries correctly you will not suffer from performance issues. If you use it correctly it is more likely to be faster due to the skill of the people creating LINQ.

只要您正确使用LINQ查询,就不会遇到性能问题。如果您正确使用它,由于创建LINQ的人员的技能,它更可能更快。

The only thing you can benefit of creating your own is if you want full control or LINQ does not offer what you need or you want a better ability to debug.

您可以创建自己的唯一好处是,如果您想要完全控制或LINQ不提供您所需要的,或者您希望更好的调试能力。

#8


3  

The cool thing about LINQ queries is that it makes it dead simple to convert to a parallel query. Depending on what you're doing, it may or may not be faster (as always, profile), but it's pretty neat, nonetheless.

关于LINQ查询的一个很酷的事情是它使转换为并行查询变得简单。取决于你正在做什么,它可能会或可能不会更快(一如既往,简介),但它仍然非常整洁。

#9


3  

To add my own experience of using LINQ where performance really does matter - with Monotouch - the difference there is still insignificant.

为了增加我自己使用LINQ的经验,其中性能确实很重要 - 使用Monotouch - 差异仍然微不足道。

You're 'handicapped' on the 3GS iPhone to around 46mb of ram and a 620mhz ARM processor. Admittedly the code is AOT compiled but even on the simulator where it is JIT'd and going through a long series of indirection the difference is tenths of a millisecond for sets of 1000s of objects.

你在3GS iPhone上“残疾”到大约46mb的RAM和620mhz的ARM处理器。不可否认,代码是AOT编译的,但即使在模拟器上它是JIT并且经历了一系列间接,对于1000个对象的集合,差异是十分之一毫秒。

Along with Windows Mobile this is where you have to worry about the performance costs - not in huge ASP.NET applications that are running on quad-core 8gb servers, or desktops with dual scores. One exception to this would be with large object sets, although arguably you would lazy load anyway, and the initial query task would be performed on the database server.

与Windows Mobile一起,您需要担心性能成本 - 而不是在四核8gb服务器上运行的大型ASP.NET应用程序或具有双重分数的桌面。对此的一个例外是大型对象集,尽管可以说你无论如何都会延迟加载,并且初始查询任务将在数据库服务器上执行。

It's a bit of a cliché on *, but use the shorter more readable code until 100s of milliseconds really do matter.

这在*上有点陈词滥调,但使用更短的更易读的代码,直到100毫秒确实很重要。