EF生成的查询执行时间太长

时间:2022-04-14 09:43:59

I have a very simple query which is generated by Entity-Framework, Sometimes when I try to run this query It almost takes more than 30 seconds to be executed, and I got time out Exception.

我有一个非常简单的查询,它是由实体-框架生成的,有时当我尝试运行这个查询时,几乎需要30秒以上的时间来执行,我得到了time out Exception。

SELECT TOP (10) 
[Extent1].[LinkID] AS [LinkID], 
[Extent1].[Title] AS [Title], 
[Extent1].[Url] AS [Url], 
[Extent1].[Description] AS [Description], 
[Extent1].[SentDate] AS [SentDate], 
[Extent1].[VisitCount] AS [VisitCount], 
[Extent1].[RssSourceId] AS [RssSourceId], 
[Extent1].[ReviewStatus] AS [ReviewStatus], 
[Extent1].[UserAccountId] AS [UserAccountId], 
[Extent1].[CreationDate] AS [CreationDate]
FROM ( SELECT [Extent1].[LinkID] AS [LinkID], [Extent1].[Title] AS [Title], [Extent1].[Url] AS [Url], [Extent1].[Description] AS [Description], [Extent1].[SentDate] AS [SentDate], [Extent1].[VisitCount] AS [VisitCount], [Extent1].[RssSourceId] AS [RssSourceId], [Extent1].[ReviewStatus] AS [ReviewStatus], [Extent1].[UserAccountId] AS [UserAccountId], [Extent1].[CreationDate] AS [CreationDate], row_number() OVER (ORDER BY [Extent1].[SentDate] DESC) AS [row_number]
    FROM [dbo].[Links] AS [Extent1]
)  AS [Extent1]
WHERE [Extent1].[row_number] > 0
ORDER BY [Extent1].[SentDate] DESC

And the code which is generating the Query is:

生成查询的代码是:

public async Task<IQueryable<TEntity>> GetAsync(Expression<Func<TEntity, bool>> filter = null,
    Func<IQueryable<TEntity>, IOrderedQueryable<TEntity>> orderBy = null)
{
    return await Task.Run(() =>
    {
        IQueryable<TEntity> query = _dbSet;
        if (filter != null)
        {
            query = query.Where(filter);
        }

        if (orderBy != null)
        {
            query = orderBy(query);
        }

        return query;
    });
}

Note that when I remove inner Select statement and Where clause and change it to following, Query executes fine in a less than a second.

注意,当我删除内部Select语句和Where子句并将其更改为following时,查询在不到一秒钟的时间内执行得很好。

SELECT TOP (10) 
[Extent1].[LinkID] AS [LinkID], 
[Extent1].[Title] AS [Title], 
.
.
.
FROM [dbo].[Links] AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC

Any advice will be helpful.

任何建议都是有用的。

UPDATE:

更新:

Here is the usage of Above code:

以下是上述代码的用法:

var dbLinks = await _uow.LinkRespository.GetAsync(filter, orderBy);
var pagedLinks = new PagedList<Link>(dbLinks, pageNumber, PAGE_SIZE);
var vmLinks = Mapper.Map<IPagedList<LinkViewItemViewModel>>(pagedLinks);

And filter:

和过滤:

var result = await GetLinks(null, pageNo, a => a.OrderByDescending(x => x.SentDate));

11 个解决方案

#1


5  

I'm guessing the WHERE row_number > 0 will change over time as you ask for page 2, page 3, etc...

我猜row_number >随着时间的推移会随着你要求第二页,第三页等的变化而变化……

As such, I'm curious if it would help to create this index:

因此,我很想知道它是否有助于建立这个指数:

CREATE INDEX idx_links_SentDate_desc ON [dbo].[Links] ([SentDate] DESC)

In all honesty, IF it works, it's pretty much a band-aid and you'll probably will need to rebuild this index on a frequent basis as I'm guessing it will get fragmented over time...

坦白地说,如果它真的管用的话,那基本上就是个权宜之计,你可能需要频繁地重新构建这个指数,因为我猜随着时间的推移,它会变得支离破碎……

UPDATE: check the comments! Turns out the DESC has no effect whatsoever and should be avoided if your data comes in low to high!

更新:检查评论!原来DESC没有任何影响,应该避免如果你的数据从低到高!

#2


9  

It never occurred to me that you simply didn't have an index. Lesson learnt - always check the basics before digging further.

我从没想过你根本就没有索引。吸取的教训——在深入挖掘之前,一定要检查一下基础知识。


If you don't need pagination, then the query can be simplified to

如果不需要分页,那么可以将查询简化为

SELECT TOP (10) 
    [Extent1].[LinkID] AS [LinkID], 
    [Extent1].[Title] AS [Title], 
    ...
FROM [dbo].[Links] AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC

and it runs fast, as you've verified.

正如你所证实的,它运行得很快。

Apparently, you do need the pagination, so let's see what we can do.

显然,您确实需要分页,所以让我们看看我们能做什么。

The reason why your current version is slow, because it scans the whole table first, calculates row number for each and every row and only then returns 10 rows. I was wrong here. SQL Server optimizer is pretty smart. The root of your problem is somewhere else. See my update below.

当前版本的速度很慢,因为它首先扫描整个表,计算每一行的行号,然后返回10行。我错了。SQL Server优化器非常聪明。你问题的根源在别处。看我下面的更新。


BTW, as other people mentioned, this pagination will work correctly only if SentDate column is unique. If it is not unique, you need to ORDER BY SentDate and another unique column like some ID to resolve ambiguity.

顺便说一下,正如其他人提到的,只有在SentDate列是唯一的情况下,这个分页才会正常工作。如果它不是唯一的,您需要通过SentDate和另一个独特的列(比如一些ID)来解决歧义。

If you don't need ability to jump straight to particular page, but rather always start with page 1, then go to next page, next page and so on, then the proper efficient way to do such pagination is described in this excellent article: http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way The author uses PostgreSQL for illustration, but the technique works for MS SQL Server as well. It boils down to remembering the ID of the last row on the shown page and then using this ID in the WHERE clause with appropriate supporting index to retrieve the next page without scanning all previous rows.

如果你不需要直接跳到特定页面的能力,而是总是从第1页开始,然后去下一个页面,下一个页面,然后适当有效的方法做分页中描述这样的优秀文章:http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way作者使用PostgreSQL的插图,但这项技术适用于MS SQL Server。它可归纳为在显示的页面上记住最后一行的ID,然后在WHERE子句中使用该ID,并使用适当的支持索引检索下一页,而不扫描所有前面的行。

SQL Server 2008 doesn't have a built-in support for pagination, so we'll have to use workaround. I will show one variant that allows to jump straight to a given page and would work fast for first pages, but would become slower and slower for further pages.

SQL Server 2008没有对分页的内置支持,因此我们必须使用变通方法。我将展示一种变体,它允许直接跳转到一个给定的页面,并且在第一个页面上工作速度很快,但是在以后的页面上速度会越来越慢。

You will have these variables (PageSize, PageNumber) in your C# code. I put them here to illustrate the point.

您将在c#代码中拥有这些变量(PageSize, PageNumber)。我把它们放在这里来说明这一点。

DECLARE @VarPageSize int = 10; -- number of rows in each page
DECLARE @VarPageNumber int = 3; -- page numeration is zero-based

SELECT TOP (@VarPageSize)
    [Extent1].[LinkID] AS [LinkID]
    ,[Extent1].[Title] AS [Title]
    ,[Extent1].[Url] AS [Url]
    ,[Extent1].[Description] AS [Description]
    ,[Extent1].[SentDate] AS [SentDate]
    ,[Extent1].[VisitCount] AS [VisitCount]
    ,[Extent1].[RssSourceId] AS [RssSourceId]
    ,[Extent1].[ReviewStatus] AS [ReviewStatus]
    ,[Extent1].[UserAccountId] AS [UserAccountId]
    ,[Extent1].[CreationDate] AS [CreationDate]
FROM
    (
        SELECT TOP((@VarPageNumber + 1) * @VarPageSize)
            [Extent1].[LinkID] AS [LinkID]
            ,[Extent1].[Title] AS [Title]
            ,[Extent1].[Url] AS [Url]
            ,[Extent1].[Description] AS [Description]
            ,[Extent1].[SentDate] AS [SentDate]
            ,[Extent1].[VisitCount] AS [VisitCount]
            ,[Extent1].[RssSourceId] AS [RssSourceId]
            ,[Extent1].[ReviewStatus] AS [ReviewStatus]
            ,[Extent1].[UserAccountId] AS [UserAccountId]
            ,[Extent1].[CreationDate] AS [CreationDate]
        FROM [dbo].[Links] AS [Extent1]
        ORDER BY [Extent1].[SentDate] DESC
    ) AS [Extent1]
ORDER BY [Extent1].[SentDate] ASC
;

The first page is rows 1 to 10, second page is 11 to 20 and so on. Let's see how this query works when we try to get the fourth page, i.e. rows 31 to 40. PageSize=10, PageNumber=3. In the inner query we select first 40 rows. Note, that we don't scan the whole table here, we scan only first 40 rows. We don't even need explicit ROW_NUMBER(). Then we need to select last 10 rows out of those found 40, so outer query selects TOP(10) with ORDER BY in the opposite direction. As is this will return rows 40 to 31 in reverse order. You can sort them back into correct order on the client, or add one more outer query, which simply sorts them again by SentDate DESC. Like this:

第一页是行1到10,第二页是11到20等等。让我们看看当我们尝试获取第4页(即第31到40行)时这个查询是如何工作的。页大小= 10,PageNumber = 3。在内部查询中,我们选择前40行。注意,我们没有扫描整个表,我们只扫描前40行。我们甚至不需要显式的ROW_NUMBER()。然后我们需要从找到的40行中选择最后10行,因此外部查询选择TOP(10),顺序相反。这将以相反的顺序返回第40行到第31行。您可以在客户端上按正确的顺序对它们进行排序,或者再添加一个外部查询,通过SentDate DESC对它们进行排序。

SELECT
    [Extent1].[LinkID] AS [LinkID]
    ,[Extent1].[Title] AS [Title]
    ,[Extent1].[Url] AS [Url]
    ,[Extent1].[Description] AS [Description]
    ,[Extent1].[SentDate] AS [SentDate]
    ,[Extent1].[VisitCount] AS [VisitCount]
    ,[Extent1].[RssSourceId] AS [RssSourceId]
    ,[Extent1].[ReviewStatus] AS [ReviewStatus]
    ,[Extent1].[UserAccountId] AS [UserAccountId]
    ,[Extent1].[CreationDate] AS [CreationDate]
FROM
    (
        SELECT TOP (@VarPageSize)
            [Extent1].[LinkID] AS [LinkID]
            ,[Extent1].[Title] AS [Title]
            ,[Extent1].[Url] AS [Url]
            ,[Extent1].[Description] AS [Description]
            ,[Extent1].[SentDate] AS [SentDate]
            ,[Extent1].[VisitCount] AS [VisitCount]
            ,[Extent1].[RssSourceId] AS [RssSourceId]
            ,[Extent1].[ReviewStatus] AS [ReviewStatus]
            ,[Extent1].[UserAccountId] AS [UserAccountId]
            ,[Extent1].[CreationDate] AS [CreationDate]
        FROM
            (
                SELECT TOP((@VarPageNumber + 1) * @VarPageSize)
                    [Extent1].[LinkID] AS [LinkID]
                    ,[Extent1].[Title] AS [Title]
                    ,[Extent1].[Url] AS [Url]
                    ,[Extent1].[Description] AS [Description]
                    ,[Extent1].[SentDate] AS [SentDate]
                    ,[Extent1].[VisitCount] AS [VisitCount]
                    ,[Extent1].[RssSourceId] AS [RssSourceId]
                    ,[Extent1].[ReviewStatus] AS [ReviewStatus]
                    ,[Extent1].[UserAccountId] AS [UserAccountId]
                    ,[Extent1].[CreationDate] AS [CreationDate]
                FROM [dbo].[Links] AS [Extent1]
                ORDER BY [Extent1].[SentDate] DESC
            ) AS [Extent1]
        ORDER BY [Extent1].[SentDate] ASC
    ) AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC

This query (as original query) would work always correctly only if SentDate is unique. If it is not unique, add unique column to the ORDER BY. For example, if LinkID is unique, then in the inner-most query use ORDER BY SentDate DESC, LinkID DESC. In the outer query reverse the order: ORDER BY SentDate ASC, LinkID ASC.

这个查询(作为原始查询)只有在SentDate是唯一的情况下才能正常工作。如果它不是唯一的,请在ORDER BY中添加惟一的列。例如,如果LinkID是唯一的,那么在最内部的查询中,使用SentDate DESC、LinkID DESC的顺序。

Obviously, if you want to jump to page 1000, then the inner query would have to read 10,000 rows, so the further you go, the slower it gets.

显然,如果您想跳转到第1000页,那么内部查询将必须读取10,000行,因此您越走越慢。

In any case, you need to have an index on SentDate (or SentDate, LinkID) to make it work. Without an index the query would scan the whole table again.

在任何情况下,您都需要在SentDate(或SentDate, LinkID)上建立一个索引才能使它正常工作。如果没有索引,查询将再次扫描整个表。

I'm not telling you here how to translate this query to EF, because I don't know. I never used EF. There may be a way. Also, apparently, you can just force it to use actual SQL, rather than trying to play with C# code.

我没有告诉你们如何将这个查询转换为EF,因为我不知道。我从未使用过英孚。也许是有办法的。而且,显然,您可以强制它使用实际的SQL,而不是尝试使用c#代码。

Update

Execution plans comparison

执行计划的比较

In my database I have a table EventLogErrors with 29,477,859 rows and I compared on SQL Server 2008 the query with ROW_NUMBER that EF generates and what I suggested here with TOP. I tried to retrieve the fourth page 10 rows long. In both cases optimizer was smart enough to read only 40 rows, as you can see from the execution plans. I used a primary key column for ordering and pagination for this test. When I used another indexed column for pagination results were the same, i.e. both variants read only 40 rows. Needless to say, both variants returned results in a fraction of a second.

在我的数据库中,我有一个包含29,477,859行的表EventLogErrors,我在SQL Server 2008上比较了EF生成的ROW_NUMBER查询以及我在这里建议的TOP。我试图检索第4页10行。在这两种情况下,优化器都足够聪明,只能读取40行,您可以从执行计划中看到这一点。我使用了一个主键列来为这个测试排序和分页。当我对分页结果使用另一个索引列时,它们是相同的,也就是说,两个变量只读取40行。不用说,这两种变体在一秒钟内都返回了结果。

Variant with TOP

变体与高层

EF生成的查询执行时间太长

Variant with ROW_NUMBER

变体与ROW_NUMBER

EF生成的查询执行时间太长

What it all means is that the root of your problem is somewhere else. You mentioned that your query runs slowly only sometimes and I didn't really pay attention to it originally. With such symptom I would do the following:

它的意思是你问题的根源在别处。您提到您的查询有时运行缓慢,而我最初并没有真正注意到它。有了这样的症状,我将做以下的事:

  • Check execution plan.
  • 检查执行计划。
  • Check that you do have an index.
  • 检查你是否有索引。
  • Check that the index is not heavily fragmented and statistics is not outdated.
  • 检查索引是否没有严重的碎片化和统计数据是否过时。
  • The SQL Server has a feature called Auto-Parameterization. Also, it has a feature called Parameter Sniffing. Also, it has a feature called Execution plan caching. When all three features work together it may result in using a non-optimal execution plan. There is an excellent article by Erland Sommarskog explaining it in detail: http://www.sommarskog.se/query-plan-mysteries.html This article explains how to confirm that the problem is really in parameter sniffing by checking the cached execution plan and what can be done to fix the problem.
  • SQL服务器有一个称为自动参数化的特性。此外,它还有一个称为参数嗅探的特性。此外,它还有一个称为执行计划缓存的特性。当这三个特性一起工作时,可能会导致使用非最优执行计划。Erland Sommarskog有一篇优秀的文章详细解释了它:http://www.sommarskog.se/queryplan-mystery .html这篇文章解释了如何通过检查缓存的执行计划来确认问题确实存在于参数嗅探中,以及如何解决问题。

#3


2  

Sometimes the inner select can cause problems with the execution plan, but it's the easiest way for the expression tree to be built from the code. Usually, it won't affect performance too much.

有时候内部选择会导致执行计划出现问题,但是从代码中构建表达式树是最简单的方法。通常,它不会对性能产生太大的影响。

Clearly in this case it does. One workaround is to use your own query with ExecuteStoreQuery. Something like this:

显然,在这种情况下。一个解决方案是使用自己的查询和ExecuteStoreQuery。是这样的:

int takeNo = 20;
int skipNo = 100;

var results = db.ExecuteStoreQuery<Link>(
    "SELECT LinkID, Title, Url, Description, SentDate, VisitCount, RssSourceId, ReviewStatus, UserAccountId, CreationDate FROM Links", 
    null);

results = results.OrderBy(x=> x.SentDate).Skip(skipNo).Take(takeNo);

Of course you lose a lot of the benefits of using an ORM in the first place by doing this, but it might be acceptable for an exceptional circumstance.

当然,这样做首先会使您失去使用ORM的许多好处,但是在特殊情况下它可能是可以接受的。

#4


1  

This looks like a standard paging query. I would guess that you do not have an index on SentDate. If so, the first thing to try is adding an index on SentDate and seeing what kind of impact this has on performance. Assuming that you do not always want to sort/page on SentDate and that indexing every column that you might want to sort/page by is not going to happen, take a look at this other * question. In some cases, SQL Server's "Gather Streams" parallelism operation can overflow into TempDb. When this happens, performance goes into the toilet. As the other answer says, Indexing the column can help, as can disabling parallelism. Check out your query plan and see if it looks like this might be the issue.

这看起来像一个标准的分页查询。我猜你在句子上没有索引。如果是这样,第一步就是在SentDate上添加一个索引,看看这对性能有什么影响。假设您并不总是希望在SentDate上排序/页,并且不会对每个可能要排序/页的列进行索引,那么看看另一个*问题。在某些情况下,SQL Server的“收集流”并行操作可能会溢出到TempDb中。当这种情况发生时,表演就进入了厕所。正如另一个答案所说,索引列可以帮助,因为可以禁用并行性。检查您的查询计划,看看这是否可能是问题所在。

#5


1  

I am not very good in EF but can give you hints. First of all you have to check if you have an non-clustered index on [Extent1].[SentDate]. Second if not, create, if exists, then recreate or re-arrange it.

我在英孚的表现不太好,但可以给你一些提示。首先,您必须检查在[Extent1].[SentDate]上是否有非聚集索引。其次,如果没有,创建(如果存在),然后重新创建或重新安排它。

Third change your query like this. As your original SQL is nothing just written un-necessary complex and it would result same as this one I am showing here. Try to write things simple, will work faster and maintenance would also be easy.

第三,像这样更改查询。由于您最初的SQL不是简单地编写不必要的复杂语句,因此它的结果将与我在这里展示的这个相同。试着把事情写得简单些,工作起来会更快,维护起来也会更容易。

SELECT TOP (10) 
[Extent1].[LinkID] AS [LinkID], 
[Extent1].[Title] AS [Title], 
[Extent1].[Url] AS [Url], 
[Extent1].[Description] AS [Description], 
[Extent1].[SentDate] AS [SentDate], 
[Extent1].[VisitCount] AS [VisitCount], 
[Extent1].[RssSourceId] AS [RssSourceId], 
[Extent1].[ReviewStatus] AS [ReviewStatus], 
[Extent1].[UserAccountId] AS [UserAccountId], 
[Extent1].[CreationDate] AS [CreationDate]
FROM [dbo].[Links] AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC

or modify this one little bit like this if case it result different.

或者像这样修改一下如果情况不同的话。

select top 10 A.* from (
SELECT * from
[Extent1].[LinkID] AS [LinkID], 
[Extent1].[Title] AS [Title], 
[Extent1].[Url] AS [Url], 
[Extent1].[Description] AS [Description], 
[Extent1].[SentDate] AS [SentDate], 
[Extent1].[VisitCount] AS [VisitCount], 
[Extent1].[RssSourceId] AS [RssSourceId], 
[Extent1].[ReviewStatus] AS [ReviewStatus], 
[Extent1].[UserAccountId] AS [UserAccountId], 
[Extent1].[CreationDate] AS [CreationDate]
FROM [dbo].[Links] AS [Extent1] ) A
ORDER BY A.[SentDate] DESC 

I am 99% sure it will work.

99%的人相信它会成功。

#6


0  

Have you tried chaining in the method?

你试过用链接法吗?

        IQueryable<TEntity> query = _dbSet;
        return query.Where(x => (filter != null ? filter : x)
                    .Where(x => (orderBy != null ? orderBy : x));

I am wondering if this will change the query that is created by EF.

我想知道这是否会改变EF创建的查询。

#7


0  

Your code looks like somewhat obscure for me, And this is first time where I encounter such this querying. As you told, sometimes It takes too long time to execute, so it tells the query can be interpreted in another ways somewhere, perhaps by ignoring EF performance considerations in some cases, So try to rearrange query conditions/selections and consider lazy loading in your program logic.

您的代码对我来说有点晦涩,这是我第一次遇到这样的查询。正如您所讲的,有时执行起来花费的时间太长,因此它告诉查询可以以另一种方式来解释,可能在某些情况下忽略了EF性能考虑,所以尝试重新安排查询条件/选择,并考虑程序逻辑中的惰性加载。

#8


0  

Aren't you bitten by the Statistic update problem in SQL server?

您没有被SQL server中的统计更新问题困扰吗?

ALTER DATABASE YourDBName SET AUTO_UPDATE_STATISTICS_ASYNC ON

将数据库设置为AUTO_UPDATE_STATISTICS_ASYNC

Default is OFF, thereby your SQL server will stall when 20% of your data has changed - waiting for the Statistics update before running the query.

默认值为OFF,因此当20%的数据发生更改时,SQL服务器将停止工作——在运行查询之前等待统计数据更新。

#9


0  

I have run into similar issues before where EF will decide to decorate the SQL it decides to run in a very non-performant fashion.

我之前遇到过类似的问题,EF决定用一种非常无性能的方式运行SQL。

Anyways, to provide a possible solution to your question:

无论如何,为你的问题提供一个可能的解决方案:

On instances where I don't like what EF does with my code to generate SQL statements, I end up writing a stored procedure, import that into my EDMX as a function and use that to retrieve my data. It affords me control on how to formulate the SQL and I know exactly what index I need to leverage to get the best performance out of this. I imagine you know how to write a stored proc and import that as a function into EF so I will leave those details out. Hope this helps you.

如果我不喜欢EF用我的代码生成SQL语句,我就会编写一个存储过程,将它作为函数导入到我的EDMX中,并使用它检索我的数据。它使我能够控制如何构造SQL,并且我确切地知道需要利用什么索引才能获得最佳性能。我想你应该知道如何编写一个存储的proc,并将其作为函数导入EF,因此我将把这些细节排除在外。希望这能帮助你。

I will still keep checking this page to see if someone comes up with a nicer, less painful solution to your issue.

我将继续检查这一页,看看是否有人提出一个更好的,不那么痛苦的解决方案你的问题。

#10


0  

Call me crazy, but it looks like you've got the thing ordering itself with itself when this code is called:

你可以说我疯了,但是当这个代码被调用的时候,看起来你已经有了自排序的东西:

if (orderBy != null)
{
    query = orderBy(query);
}

I think that would explain the whole "sometimes it's slow" bit. Probably runs fine until you have something in the orderBy parameter, then it's calling itself and creating that row numbered sub-select that slows it down.

我认为这可以解释“有时它很慢”这一点。可能运行良好,直到您在orderBy参数中有一些内容,然后它调用自己并创建行编号的子选择,从而减慢它的速度。

Try commenting out the query = orderBy(query) portion of your code and see if you still get the slow down. I'm betting that you won't.

请尝试注释掉代码的query = orderBy(查询)部分,看看是否仍然会减慢速度。我打赌你不会。

Also, you can simplify your code using Dynamic LINQ. It basically lets you specific sorting with a string name of a field (.orderby("somefield")) instead of trying to pass in a method, which I've found to be a lot easier. I use that in MVC apps to handle sorting by whatever field the users clicks on a grid.

此外,您还可以使用动态LINQ简化代码。它基本上允许您使用字段的字符串名称(.orderby(“somefield”))进行特定的排序,而不是尝试传入方法,我发现这要简单得多。我在MVC应用程序中使用它来处理用户在网格上单击的任何字段的排序。

#11


0  

Try adding a non-clustered index on SentDate

尝试在SentDate上添加非聚集索引

#1


5  

I'm guessing the WHERE row_number > 0 will change over time as you ask for page 2, page 3, etc...

我猜row_number >随着时间的推移会随着你要求第二页,第三页等的变化而变化……

As such, I'm curious if it would help to create this index:

因此,我很想知道它是否有助于建立这个指数:

CREATE INDEX idx_links_SentDate_desc ON [dbo].[Links] ([SentDate] DESC)

In all honesty, IF it works, it's pretty much a band-aid and you'll probably will need to rebuild this index on a frequent basis as I'm guessing it will get fragmented over time...

坦白地说,如果它真的管用的话,那基本上就是个权宜之计,你可能需要频繁地重新构建这个指数,因为我猜随着时间的推移,它会变得支离破碎……

UPDATE: check the comments! Turns out the DESC has no effect whatsoever and should be avoided if your data comes in low to high!

更新:检查评论!原来DESC没有任何影响,应该避免如果你的数据从低到高!

#2


9  

It never occurred to me that you simply didn't have an index. Lesson learnt - always check the basics before digging further.

我从没想过你根本就没有索引。吸取的教训——在深入挖掘之前,一定要检查一下基础知识。


If you don't need pagination, then the query can be simplified to

如果不需要分页,那么可以将查询简化为

SELECT TOP (10) 
    [Extent1].[LinkID] AS [LinkID], 
    [Extent1].[Title] AS [Title], 
    ...
FROM [dbo].[Links] AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC

and it runs fast, as you've verified.

正如你所证实的,它运行得很快。

Apparently, you do need the pagination, so let's see what we can do.

显然,您确实需要分页,所以让我们看看我们能做什么。

The reason why your current version is slow, because it scans the whole table first, calculates row number for each and every row and only then returns 10 rows. I was wrong here. SQL Server optimizer is pretty smart. The root of your problem is somewhere else. See my update below.

当前版本的速度很慢,因为它首先扫描整个表,计算每一行的行号,然后返回10行。我错了。SQL Server优化器非常聪明。你问题的根源在别处。看我下面的更新。


BTW, as other people mentioned, this pagination will work correctly only if SentDate column is unique. If it is not unique, you need to ORDER BY SentDate and another unique column like some ID to resolve ambiguity.

顺便说一下,正如其他人提到的,只有在SentDate列是唯一的情况下,这个分页才会正常工作。如果它不是唯一的,您需要通过SentDate和另一个独特的列(比如一些ID)来解决歧义。

If you don't need ability to jump straight to particular page, but rather always start with page 1, then go to next page, next page and so on, then the proper efficient way to do such pagination is described in this excellent article: http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way The author uses PostgreSQL for illustration, but the technique works for MS SQL Server as well. It boils down to remembering the ID of the last row on the shown page and then using this ID in the WHERE clause with appropriate supporting index to retrieve the next page without scanning all previous rows.

如果你不需要直接跳到特定页面的能力,而是总是从第1页开始,然后去下一个页面,下一个页面,然后适当有效的方法做分页中描述这样的优秀文章:http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way作者使用PostgreSQL的插图,但这项技术适用于MS SQL Server。它可归纳为在显示的页面上记住最后一行的ID,然后在WHERE子句中使用该ID,并使用适当的支持索引检索下一页,而不扫描所有前面的行。

SQL Server 2008 doesn't have a built-in support for pagination, so we'll have to use workaround. I will show one variant that allows to jump straight to a given page and would work fast for first pages, but would become slower and slower for further pages.

SQL Server 2008没有对分页的内置支持,因此我们必须使用变通方法。我将展示一种变体,它允许直接跳转到一个给定的页面,并且在第一个页面上工作速度很快,但是在以后的页面上速度会越来越慢。

You will have these variables (PageSize, PageNumber) in your C# code. I put them here to illustrate the point.

您将在c#代码中拥有这些变量(PageSize, PageNumber)。我把它们放在这里来说明这一点。

DECLARE @VarPageSize int = 10; -- number of rows in each page
DECLARE @VarPageNumber int = 3; -- page numeration is zero-based

SELECT TOP (@VarPageSize)
    [Extent1].[LinkID] AS [LinkID]
    ,[Extent1].[Title] AS [Title]
    ,[Extent1].[Url] AS [Url]
    ,[Extent1].[Description] AS [Description]
    ,[Extent1].[SentDate] AS [SentDate]
    ,[Extent1].[VisitCount] AS [VisitCount]
    ,[Extent1].[RssSourceId] AS [RssSourceId]
    ,[Extent1].[ReviewStatus] AS [ReviewStatus]
    ,[Extent1].[UserAccountId] AS [UserAccountId]
    ,[Extent1].[CreationDate] AS [CreationDate]
FROM
    (
        SELECT TOP((@VarPageNumber + 1) * @VarPageSize)
            [Extent1].[LinkID] AS [LinkID]
            ,[Extent1].[Title] AS [Title]
            ,[Extent1].[Url] AS [Url]
            ,[Extent1].[Description] AS [Description]
            ,[Extent1].[SentDate] AS [SentDate]
            ,[Extent1].[VisitCount] AS [VisitCount]
            ,[Extent1].[RssSourceId] AS [RssSourceId]
            ,[Extent1].[ReviewStatus] AS [ReviewStatus]
            ,[Extent1].[UserAccountId] AS [UserAccountId]
            ,[Extent1].[CreationDate] AS [CreationDate]
        FROM [dbo].[Links] AS [Extent1]
        ORDER BY [Extent1].[SentDate] DESC
    ) AS [Extent1]
ORDER BY [Extent1].[SentDate] ASC
;

The first page is rows 1 to 10, second page is 11 to 20 and so on. Let's see how this query works when we try to get the fourth page, i.e. rows 31 to 40. PageSize=10, PageNumber=3. In the inner query we select first 40 rows. Note, that we don't scan the whole table here, we scan only first 40 rows. We don't even need explicit ROW_NUMBER(). Then we need to select last 10 rows out of those found 40, so outer query selects TOP(10) with ORDER BY in the opposite direction. As is this will return rows 40 to 31 in reverse order. You can sort them back into correct order on the client, or add one more outer query, which simply sorts them again by SentDate DESC. Like this:

第一页是行1到10,第二页是11到20等等。让我们看看当我们尝试获取第4页(即第31到40行)时这个查询是如何工作的。页大小= 10,PageNumber = 3。在内部查询中,我们选择前40行。注意,我们没有扫描整个表,我们只扫描前40行。我们甚至不需要显式的ROW_NUMBER()。然后我们需要从找到的40行中选择最后10行,因此外部查询选择TOP(10),顺序相反。这将以相反的顺序返回第40行到第31行。您可以在客户端上按正确的顺序对它们进行排序,或者再添加一个外部查询,通过SentDate DESC对它们进行排序。

SELECT
    [Extent1].[LinkID] AS [LinkID]
    ,[Extent1].[Title] AS [Title]
    ,[Extent1].[Url] AS [Url]
    ,[Extent1].[Description] AS [Description]
    ,[Extent1].[SentDate] AS [SentDate]
    ,[Extent1].[VisitCount] AS [VisitCount]
    ,[Extent1].[RssSourceId] AS [RssSourceId]
    ,[Extent1].[ReviewStatus] AS [ReviewStatus]
    ,[Extent1].[UserAccountId] AS [UserAccountId]
    ,[Extent1].[CreationDate] AS [CreationDate]
FROM
    (
        SELECT TOP (@VarPageSize)
            [Extent1].[LinkID] AS [LinkID]
            ,[Extent1].[Title] AS [Title]
            ,[Extent1].[Url] AS [Url]
            ,[Extent1].[Description] AS [Description]
            ,[Extent1].[SentDate] AS [SentDate]
            ,[Extent1].[VisitCount] AS [VisitCount]
            ,[Extent1].[RssSourceId] AS [RssSourceId]
            ,[Extent1].[ReviewStatus] AS [ReviewStatus]
            ,[Extent1].[UserAccountId] AS [UserAccountId]
            ,[Extent1].[CreationDate] AS [CreationDate]
        FROM
            (
                SELECT TOP((@VarPageNumber + 1) * @VarPageSize)
                    [Extent1].[LinkID] AS [LinkID]
                    ,[Extent1].[Title] AS [Title]
                    ,[Extent1].[Url] AS [Url]
                    ,[Extent1].[Description] AS [Description]
                    ,[Extent1].[SentDate] AS [SentDate]
                    ,[Extent1].[VisitCount] AS [VisitCount]
                    ,[Extent1].[RssSourceId] AS [RssSourceId]
                    ,[Extent1].[ReviewStatus] AS [ReviewStatus]
                    ,[Extent1].[UserAccountId] AS [UserAccountId]
                    ,[Extent1].[CreationDate] AS [CreationDate]
                FROM [dbo].[Links] AS [Extent1]
                ORDER BY [Extent1].[SentDate] DESC
            ) AS [Extent1]
        ORDER BY [Extent1].[SentDate] ASC
    ) AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC

This query (as original query) would work always correctly only if SentDate is unique. If it is not unique, add unique column to the ORDER BY. For example, if LinkID is unique, then in the inner-most query use ORDER BY SentDate DESC, LinkID DESC. In the outer query reverse the order: ORDER BY SentDate ASC, LinkID ASC.

这个查询(作为原始查询)只有在SentDate是唯一的情况下才能正常工作。如果它不是唯一的,请在ORDER BY中添加惟一的列。例如,如果LinkID是唯一的,那么在最内部的查询中,使用SentDate DESC、LinkID DESC的顺序。

Obviously, if you want to jump to page 1000, then the inner query would have to read 10,000 rows, so the further you go, the slower it gets.

显然,如果您想跳转到第1000页,那么内部查询将必须读取10,000行,因此您越走越慢。

In any case, you need to have an index on SentDate (or SentDate, LinkID) to make it work. Without an index the query would scan the whole table again.

在任何情况下,您都需要在SentDate(或SentDate, LinkID)上建立一个索引才能使它正常工作。如果没有索引,查询将再次扫描整个表。

I'm not telling you here how to translate this query to EF, because I don't know. I never used EF. There may be a way. Also, apparently, you can just force it to use actual SQL, rather than trying to play with C# code.

我没有告诉你们如何将这个查询转换为EF,因为我不知道。我从未使用过英孚。也许是有办法的。而且,显然,您可以强制它使用实际的SQL,而不是尝试使用c#代码。

Update

Execution plans comparison

执行计划的比较

In my database I have a table EventLogErrors with 29,477,859 rows and I compared on SQL Server 2008 the query with ROW_NUMBER that EF generates and what I suggested here with TOP. I tried to retrieve the fourth page 10 rows long. In both cases optimizer was smart enough to read only 40 rows, as you can see from the execution plans. I used a primary key column for ordering and pagination for this test. When I used another indexed column for pagination results were the same, i.e. both variants read only 40 rows. Needless to say, both variants returned results in a fraction of a second.

在我的数据库中,我有一个包含29,477,859行的表EventLogErrors,我在SQL Server 2008上比较了EF生成的ROW_NUMBER查询以及我在这里建议的TOP。我试图检索第4页10行。在这两种情况下,优化器都足够聪明,只能读取40行,您可以从执行计划中看到这一点。我使用了一个主键列来为这个测试排序和分页。当我对分页结果使用另一个索引列时,它们是相同的,也就是说,两个变量只读取40行。不用说,这两种变体在一秒钟内都返回了结果。

Variant with TOP

变体与高层

EF生成的查询执行时间太长

Variant with ROW_NUMBER

变体与ROW_NUMBER

EF生成的查询执行时间太长

What it all means is that the root of your problem is somewhere else. You mentioned that your query runs slowly only sometimes and I didn't really pay attention to it originally. With such symptom I would do the following:

它的意思是你问题的根源在别处。您提到您的查询有时运行缓慢,而我最初并没有真正注意到它。有了这样的症状,我将做以下的事:

  • Check execution plan.
  • 检查执行计划。
  • Check that you do have an index.
  • 检查你是否有索引。
  • Check that the index is not heavily fragmented and statistics is not outdated.
  • 检查索引是否没有严重的碎片化和统计数据是否过时。
  • The SQL Server has a feature called Auto-Parameterization. Also, it has a feature called Parameter Sniffing. Also, it has a feature called Execution plan caching. When all three features work together it may result in using a non-optimal execution plan. There is an excellent article by Erland Sommarskog explaining it in detail: http://www.sommarskog.se/query-plan-mysteries.html This article explains how to confirm that the problem is really in parameter sniffing by checking the cached execution plan and what can be done to fix the problem.
  • SQL服务器有一个称为自动参数化的特性。此外,它还有一个称为参数嗅探的特性。此外,它还有一个称为执行计划缓存的特性。当这三个特性一起工作时,可能会导致使用非最优执行计划。Erland Sommarskog有一篇优秀的文章详细解释了它:http://www.sommarskog.se/queryplan-mystery .html这篇文章解释了如何通过检查缓存的执行计划来确认问题确实存在于参数嗅探中,以及如何解决问题。

#3


2  

Sometimes the inner select can cause problems with the execution plan, but it's the easiest way for the expression tree to be built from the code. Usually, it won't affect performance too much.

有时候内部选择会导致执行计划出现问题,但是从代码中构建表达式树是最简单的方法。通常,它不会对性能产生太大的影响。

Clearly in this case it does. One workaround is to use your own query with ExecuteStoreQuery. Something like this:

显然,在这种情况下。一个解决方案是使用自己的查询和ExecuteStoreQuery。是这样的:

int takeNo = 20;
int skipNo = 100;

var results = db.ExecuteStoreQuery<Link>(
    "SELECT LinkID, Title, Url, Description, SentDate, VisitCount, RssSourceId, ReviewStatus, UserAccountId, CreationDate FROM Links", 
    null);

results = results.OrderBy(x=> x.SentDate).Skip(skipNo).Take(takeNo);

Of course you lose a lot of the benefits of using an ORM in the first place by doing this, but it might be acceptable for an exceptional circumstance.

当然,这样做首先会使您失去使用ORM的许多好处,但是在特殊情况下它可能是可以接受的。

#4


1  

This looks like a standard paging query. I would guess that you do not have an index on SentDate. If so, the first thing to try is adding an index on SentDate and seeing what kind of impact this has on performance. Assuming that you do not always want to sort/page on SentDate and that indexing every column that you might want to sort/page by is not going to happen, take a look at this other * question. In some cases, SQL Server's "Gather Streams" parallelism operation can overflow into TempDb. When this happens, performance goes into the toilet. As the other answer says, Indexing the column can help, as can disabling parallelism. Check out your query plan and see if it looks like this might be the issue.

这看起来像一个标准的分页查询。我猜你在句子上没有索引。如果是这样,第一步就是在SentDate上添加一个索引,看看这对性能有什么影响。假设您并不总是希望在SentDate上排序/页,并且不会对每个可能要排序/页的列进行索引,那么看看另一个*问题。在某些情况下,SQL Server的“收集流”并行操作可能会溢出到TempDb中。当这种情况发生时,表演就进入了厕所。正如另一个答案所说,索引列可以帮助,因为可以禁用并行性。检查您的查询计划,看看这是否可能是问题所在。

#5


1  

I am not very good in EF but can give you hints. First of all you have to check if you have an non-clustered index on [Extent1].[SentDate]. Second if not, create, if exists, then recreate or re-arrange it.

我在英孚的表现不太好,但可以给你一些提示。首先,您必须检查在[Extent1].[SentDate]上是否有非聚集索引。其次,如果没有,创建(如果存在),然后重新创建或重新安排它。

Third change your query like this. As your original SQL is nothing just written un-necessary complex and it would result same as this one I am showing here. Try to write things simple, will work faster and maintenance would also be easy.

第三,像这样更改查询。由于您最初的SQL不是简单地编写不必要的复杂语句,因此它的结果将与我在这里展示的这个相同。试着把事情写得简单些,工作起来会更快,维护起来也会更容易。

SELECT TOP (10) 
[Extent1].[LinkID] AS [LinkID], 
[Extent1].[Title] AS [Title], 
[Extent1].[Url] AS [Url], 
[Extent1].[Description] AS [Description], 
[Extent1].[SentDate] AS [SentDate], 
[Extent1].[VisitCount] AS [VisitCount], 
[Extent1].[RssSourceId] AS [RssSourceId], 
[Extent1].[ReviewStatus] AS [ReviewStatus], 
[Extent1].[UserAccountId] AS [UserAccountId], 
[Extent1].[CreationDate] AS [CreationDate]
FROM [dbo].[Links] AS [Extent1]
ORDER BY [Extent1].[SentDate] DESC

or modify this one little bit like this if case it result different.

或者像这样修改一下如果情况不同的话。

select top 10 A.* from (
SELECT * from
[Extent1].[LinkID] AS [LinkID], 
[Extent1].[Title] AS [Title], 
[Extent1].[Url] AS [Url], 
[Extent1].[Description] AS [Description], 
[Extent1].[SentDate] AS [SentDate], 
[Extent1].[VisitCount] AS [VisitCount], 
[Extent1].[RssSourceId] AS [RssSourceId], 
[Extent1].[ReviewStatus] AS [ReviewStatus], 
[Extent1].[UserAccountId] AS [UserAccountId], 
[Extent1].[CreationDate] AS [CreationDate]
FROM [dbo].[Links] AS [Extent1] ) A
ORDER BY A.[SentDate] DESC 

I am 99% sure it will work.

99%的人相信它会成功。

#6


0  

Have you tried chaining in the method?

你试过用链接法吗?

        IQueryable<TEntity> query = _dbSet;
        return query.Where(x => (filter != null ? filter : x)
                    .Where(x => (orderBy != null ? orderBy : x));

I am wondering if this will change the query that is created by EF.

我想知道这是否会改变EF创建的查询。

#7


0  

Your code looks like somewhat obscure for me, And this is first time where I encounter such this querying. As you told, sometimes It takes too long time to execute, so it tells the query can be interpreted in another ways somewhere, perhaps by ignoring EF performance considerations in some cases, So try to rearrange query conditions/selections and consider lazy loading in your program logic.

您的代码对我来说有点晦涩,这是我第一次遇到这样的查询。正如您所讲的,有时执行起来花费的时间太长,因此它告诉查询可以以另一种方式来解释,可能在某些情况下忽略了EF性能考虑,所以尝试重新安排查询条件/选择,并考虑程序逻辑中的惰性加载。

#8


0  

Aren't you bitten by the Statistic update problem in SQL server?

您没有被SQL server中的统计更新问题困扰吗?

ALTER DATABASE YourDBName SET AUTO_UPDATE_STATISTICS_ASYNC ON

将数据库设置为AUTO_UPDATE_STATISTICS_ASYNC

Default is OFF, thereby your SQL server will stall when 20% of your data has changed - waiting for the Statistics update before running the query.

默认值为OFF,因此当20%的数据发生更改时,SQL服务器将停止工作——在运行查询之前等待统计数据更新。

#9


0  

I have run into similar issues before where EF will decide to decorate the SQL it decides to run in a very non-performant fashion.

我之前遇到过类似的问题,EF决定用一种非常无性能的方式运行SQL。

Anyways, to provide a possible solution to your question:

无论如何,为你的问题提供一个可能的解决方案:

On instances where I don't like what EF does with my code to generate SQL statements, I end up writing a stored procedure, import that into my EDMX as a function and use that to retrieve my data. It affords me control on how to formulate the SQL and I know exactly what index I need to leverage to get the best performance out of this. I imagine you know how to write a stored proc and import that as a function into EF so I will leave those details out. Hope this helps you.

如果我不喜欢EF用我的代码生成SQL语句,我就会编写一个存储过程,将它作为函数导入到我的EDMX中,并使用它检索我的数据。它使我能够控制如何构造SQL,并且我确切地知道需要利用什么索引才能获得最佳性能。我想你应该知道如何编写一个存储的proc,并将其作为函数导入EF,因此我将把这些细节排除在外。希望这能帮助你。

I will still keep checking this page to see if someone comes up with a nicer, less painful solution to your issue.

我将继续检查这一页,看看是否有人提出一个更好的,不那么痛苦的解决方案你的问题。

#10


0  

Call me crazy, but it looks like you've got the thing ordering itself with itself when this code is called:

你可以说我疯了,但是当这个代码被调用的时候,看起来你已经有了自排序的东西:

if (orderBy != null)
{
    query = orderBy(query);
}

I think that would explain the whole "sometimes it's slow" bit. Probably runs fine until you have something in the orderBy parameter, then it's calling itself and creating that row numbered sub-select that slows it down.

我认为这可以解释“有时它很慢”这一点。可能运行良好,直到您在orderBy参数中有一些内容,然后它调用自己并创建行编号的子选择,从而减慢它的速度。

Try commenting out the query = orderBy(query) portion of your code and see if you still get the slow down. I'm betting that you won't.

请尝试注释掉代码的query = orderBy(查询)部分,看看是否仍然会减慢速度。我打赌你不会。

Also, you can simplify your code using Dynamic LINQ. It basically lets you specific sorting with a string name of a field (.orderby("somefield")) instead of trying to pass in a method, which I've found to be a lot easier. I use that in MVC apps to handle sorting by whatever field the users clicks on a grid.

此外,您还可以使用动态LINQ简化代码。它基本上允许您使用字段的字符串名称(.orderby(“somefield”))进行特定的排序,而不是尝试传入方法,我发现这要简单得多。我在MVC应用程序中使用它来处理用户在网格上单击的任何字段的排序。

#11


0  

Try adding a non-clustered index on SentDate

尝试在SentDate上添加非聚集索引