结合表达式而不是在Entity Framework中使用多个查询

时间:2023-01-21 09:41:55

I have following generic queryable (which may already have selections applied):

我有以下通用可查询(可能已经应用了选择):

IQueryable<TEntity> queryable = DBSet<TEntity>.AsQueryable();

Then there is the Provider class that looks like this:

然后是Provider类,如下所示:

public class Provider<TEntity>
{
    public Expression<Func<TEntity, bool>> Condition { get; set; }

    [...]
}

The Condition could be defined per instance in the following fashion:

可以按以下方式为每个实例定义Condition:

Condition = entity => entity.Id == 3;

Now I want to select all Provider instances which have a Condition that is met at least by one entity of the DBSet:

现在我想选择具有至少由DBSet的一个实体满足的Condition的所有Provider实例:

List<Provider> providers = [...];
var matchingProviders = providers.Where(provider => queryable.Any(provider.Condition))

The problem with this: I'm starting a query for each Provider instance in the list. I'd rather use a single query to achieve the same result. This topic is especially important because of questionable performance. How can I achieve the same results with a single query and improve performance using Linq statements or Expression Trees?

问题是:我正在为列表中的每个Provider实例启动查询。我宁愿使用单个查询来实现相同的结果。由于性能可疑,此主题尤其重要。如何使用单个查询获得相同的结果并使用Linq语句或表达式树提高性能?

5 个解决方案

#1


4  

Interesting challenge. The only way I see is to build dynamically UNION ALL query like this:

有趣的挑战。我看到的唯一方法是动态构建UNION ALL查询,如下所示:

SELECT TOP 1 0 FROM Table WHERE Condition[0]
UNION ALL
SELECT TOP 1 1 FROM Table WHERE Condition[1]
...
UNION ALL
SELECT TOP 1 N-1 FROM Table WHERE Condition[N-1]

and then use the returned numbers as index to get the matching providers.

然后使用返回的数字作为索引来获取匹配的提供程序。

Something like this:

像这样的东西:

var parameter = Expression.Parameter(typeof(TEntity), "e");
var indexQuery = providers
    .Select((provider, index) => queryable
        .Where(provider.Condition)
        .Take(1)
        .Select(Expression.Lambda<Func<TEntity, int>>(Expression.Constant(index), parameter)))
    .Aggregate(Queryable.Concat);

var indexes = indexQuery.ToList();
var matchingProviders = indexes.Select(index => providers[index]);

Note that I could have built the query without using Expression class by replacing the above Select with

请注意,我可以通过替换上面的Select with而不使用Expression类来构建查询

.Select(_ => index)

but that would introduce unnecessary SQL query parameter for each index.

但这会为每个索引引入不必要的SQL查询参数。

#2


4  

Here is another (crazy) idea that came in my mind. Please note that similar to my previous answer, it doesn't guarantee better performance (in fact it could be worse). It just presents a way to do what you are asking with a single SQL query.

这是我心中想到的另一个(疯狂)想法。请注意,与我之前的答案类似,它并不能保证更好的性能(事实上它可能更糟)。它只是提供了一种方法,可以通过单个SQL查询来执行您所要求的操作。

Here we are going to create a query that returns a single string with length N consisting of '0' and '1' characters with '1' denoting a match (something like string bit array). The query will use my favorite group by constant technique to build dynamically something like this:

这里我们将创建一个返回单个字符串的查询,其长度为N,由'0'和'1'字符组成,'1'表示匹配(类似于字符串位数组)。查询将通过常量技术使用我最喜欢的组来动态构建这样的东西:

var matchInfo = queryable
    .GroupBy(e => 1)
    .Select(g =>
        (g.Max(Condition[0] ? "1" : "0")) +
        (g.Max(Condition[1] ? "1" : "0")) +
            ...
        (g.Max(Condition[N-1] ? "1" : "0")))
    .FirstOrDefault() ?? "";

And here is the code:

这是代码:

var group = Expression.Parameter(typeof(IGrouping<int, TEntity>), "g");

var concatArgs = providers.Select(provider => Expression.Call(
        typeof(Enumerable), "Max", new[] { typeof(TEntity), typeof(string) },
        group, Expression.Lambda(
            Expression.Condition(
                provider.Condition.Body, Expression.Constant("1"), Expression.Constant("0")),
            provider.Condition.Parameters)));

var concatCall = Expression.Call(
    typeof(string).GetMethod("Concat", new[] { typeof(string[]) }),
    Expression.NewArrayInit(typeof(string), concatArgs));

var selector = Expression.Lambda<Func<IGrouping<int, TEntity>, string>>(concatCall, group);

var matchInfo = queryable
    .GroupBy(e => 1)
    .Select(selector)
    .FirstOrDefault() ?? "";

var matchingProviders = matchInfo.Zip(providers,
    (match, provider) => match == '1' ? provider : null)
    .Where(provider => provider != null)
    .ToList();

Enjoy:)

请享用:)

P.S. In my opinion, this query will run with constant speed (regarding number and type of the conditions, i.e. can be considered O(N) in the best, worst and average cases, where N is the number of the records in the table) because the database has to perform always a full table scan. Still it will be interesting to know what's the actual performance, but most likely doing something like this just doesn't worth the efforts.

附:在我看来,这个查询将以恒定的速度运行(关于条件的数量和类型,即在最佳,最差和平均情况下可以被认为是O(N),其中N是表中记录的数量),因为数据库必须始终执行全表扫描。知道什么是实际表现仍然很有趣,但最有可能做这样的事情并不值得努力。

Update: Regarding the bounty and the updated requirement:

更新:关于赏金和更新的要求:

Find a fast query that only reads a record of the table once and ends the query if already all conditions are met

查找只读取表记录一次的快速查询,如果已满足所有条件,则结束查询

There is no standard SQL construct (not even speaking about LINQ query translation) that satisfies both conditions. The constructs that allow early end like EXISTS can be used for a single condition, thus when executed for multiple conditions will violate the first rule of reading the table record only once. While the constructs that use aggregates like in this answer satisfy the first rule, but in order to produce the aggregate value they have to read all the records, thus cannot exit earlier.

没有标准的SQL构造(甚至不谈论LINQ查询转换)满足这两个条件。允许像EXISTS这样的早期结束的构造可以用于单个条件,因此当针对多个条件执行时,将违反仅读取表记录的第一个规则。虽然在这个答案中使用聚合的构造满足第一个规则,但是为了产生聚合值,它们必须读取所有记录,因此不能提前退出。

Shortly, there is no query that can satisfy both requirements. What about the fast part, it really depends of the size of the data and the number and type of the conditions, table indexes etc., so again there is simply no "best" general solution for all cases.

很快,没有任何查询可以满足这两个要求。快速部分如何,它实际上取决于数据的大小以及条件的数量和类型,表索引等,因此对于所有情况,根本没有“最佳”通用解决方案。

#3


3  

Based on this Post by @Ivan I created an expression that is slightly faster in some cases.

基于@Ivan的这篇文章,我创建了一个在某些情况下稍快的表达式。

It uses Any instead of Max to get the desired results.

它使用Any而不是Max来获得所需的结果。

var group = Expression.Parameter(typeof(IGrouping<int, TEntity>), "g");

var anyMethod = typeof(Enumerable)
    .GetMethods()
    .First(m => m.Name == "Any" && m.GetParameters()
    .Count() == 2)
    .MakeGenericMethod(typeof(TEntity));

var concatArgs = Providers.Select(provider => 
    Expression.Call(anyMethod, group, 
    Expression.Lambda(provider.Condition.Body, provider.Condition.Parameters)));

var convertExpression = concatArgs.Select(concat =>
    Expression.Condition(concat, Expression.Constant("1"), Expression.Constant("0")));

var concatCall = Expression.Call(
    typeof(string).GetMethod("Concat", new[] { typeof(string[]) }),
    Expression.NewArrayInit(typeof(string), convertExpression));

var selector = Expression.Lambda<Func<IGrouping<int, TEntity>, string>>(concatCall, group);

var matchInfo = queryable
    .GroupBy(e => 1)
    .Select(selector)
    .First();

var MatchingProviders = matchInfo.Zip(Providers,
    (match, provider) => match == '1' ? provider : null)
    .Where(provider => provider != null)
    .ToList();

#4


1  

The approach I tried here was to create Conditions and nest them into one Expression. If one of the Conditions is met, we get the index of the Provider for it.

我在这里尝试的方法是创建条件并将它们嵌套到一个表达式中。如果满足其中一个条件,我们将获得Provider的索引。

private static Expression NestedExpression(
    IEnumerable<Expression<Func<TEntity, bool>>> expressions, 
    int startIndex = 0)
{
    var range = expressions.ToList();
    range.RemoveRange(0, startIndex);

    if (range.Count == 0)
        return Expression.Constant(-1);

    return Expression.Condition(
        range[0].Body, 
        Expression.Constant(startIndex), 
        NestedExpression(expressions, ++startIndex));
}

Because the Expressions still may use different ParameterExpressions, we need an ExpressionVisitor to rewrite those:

因为表达式仍然可以使用不同的ParameterExpressions,我们需要一个ExpressionVisitor来重写它们:

private class PredicateRewriterVisitor : ExpressionVisitor
{
    private readonly ParameterExpression _parameterExpression;

    public PredicateRewriterVisitor(ParameterExpression parameterExpression)
    {
        _parameterExpression = parameterExpression;
    }

    protected override Expression VisitParameter(ParameterExpression node)
    {
        return _parameterExpression;
    }
}

For the rewrite we only need to call this method:

对于重写,我们只需要调用此方法:

private static Expression<Func<T, bool>> Rewrite<T>(
    Expression<Func<T, bool>> exp, 
    ParameterExpression parameterExpression)
{
    var newExpression = new PredicateRewriterVisitor(parameterExpression).Visit(exp);
    return (Expression<Func<T, bool>>)newExpression;
}

The query itself and the selection of the Provider instances works like this:

查询本身和Provider实例的选择如下:

var parameterExpression = Expression.Parameter(typeof(TEntity), "src");
var conditions = Providers.Select(provider => 
    Rewrite(provider.Condition, parameterExpression)
);

var nestedExpression = NestedExpression(conditions);
var lambda = Expression.Lambda<Func<TEntity, int>>(nestedExpression, parameterExpression);

var matchInfo = queryable.Select(lambda).Distinct();
var MatchingProviders = Providers.Where((provider, index) => matchInfo.Contains(index));

Note: Another option which isn't really fast as well

注意:另一个选项也不是很快

#5


1  

Here is another view of the problem that has nothing to do with expressions.

以下是与表达式无关的问题的另一种观点。

Since the main goal is to improve the performance, if the attempts to produce the result with single query don't help, we could try improving the speed by parallelizing the execution of the original multi query solution.

由于主要目标是提高性能,如果尝试使用单个查询生成结果没有帮助,我们可以尝试通过并行执行原始多查询解决方案来提高速度。

Since it's really a LINQ to Objects query (which internally executes multiple EF queries), theoretically it should be a simple matter of turning it into a PLINQ query by inserting AsParallel like this (non working):

因为它实际上是一个LINQ to Objects查询(在内部执行多个EF查询),理论上它应该是一个简单的问题,通过像这样插入AsParallel(非工作)将其转换为PLINQ查询:

var matchingProviders = providers
    .AsParallel()
    .Where(provider => queryable.Any(provider.Condition))
    .ToList();

However, it turns out that EF DbContext is not well suited for multi thread access, and the above simply generates runtime errors. So I had to resort to TPL using one of the Parallel.ForEach overloads that allows us to supply local state, which I used to allocate several DbContext instances during the execution.

但是,事实证明EF DbContext不适合多线程访问,而上述只是生成运行时错误。所以我不得不使用其中一个Parallel.ForEach重载来使用TPL,它允许我们提供本地状态,我在执行期间用它来分配几个DbContext实例。

The final working code looks like this:

最终的工作代码如下所示:

var matchingProviders = new List<Provider<TEntity>>();
Parallel.ForEach(providers,
    () => new
    {
        context = new MyDbContext(),
        matchingProviders = new List<Provider<TEntity>>()
    },
    (provider, state, data) =>
    {
        if (data.context.Set<TEntity>().Any(provider.Condition))
            data.matchingProviders.Add(provider);
        return data;
    },
    data =>
    {
        data.context.Dispose();
        if (data.matchingProviders.Count > 0)
        {
            lock (matchingProviders)
                matchingProviders.AddRange(data.matchingProviders);
        }
    }
);

If you have a multi core CPU (which is normal nowadays) and a good database server, this should give you the improvement you are seeking for.

如果你有一个多核CPU(现在很正常)和一个好的数据库服务器,这应该会为你提供所需的改进。

#1


4  

Interesting challenge. The only way I see is to build dynamically UNION ALL query like this:

有趣的挑战。我看到的唯一方法是动态构建UNION ALL查询,如下所示:

SELECT TOP 1 0 FROM Table WHERE Condition[0]
UNION ALL
SELECT TOP 1 1 FROM Table WHERE Condition[1]
...
UNION ALL
SELECT TOP 1 N-1 FROM Table WHERE Condition[N-1]

and then use the returned numbers as index to get the matching providers.

然后使用返回的数字作为索引来获取匹配的提供程序。

Something like this:

像这样的东西:

var parameter = Expression.Parameter(typeof(TEntity), "e");
var indexQuery = providers
    .Select((provider, index) => queryable
        .Where(provider.Condition)
        .Take(1)
        .Select(Expression.Lambda<Func<TEntity, int>>(Expression.Constant(index), parameter)))
    .Aggregate(Queryable.Concat);

var indexes = indexQuery.ToList();
var matchingProviders = indexes.Select(index => providers[index]);

Note that I could have built the query without using Expression class by replacing the above Select with

请注意,我可以通过替换上面的Select with而不使用Expression类来构建查询

.Select(_ => index)

but that would introduce unnecessary SQL query parameter for each index.

但这会为每个索引引入不必要的SQL查询参数。

#2


4  

Here is another (crazy) idea that came in my mind. Please note that similar to my previous answer, it doesn't guarantee better performance (in fact it could be worse). It just presents a way to do what you are asking with a single SQL query.

这是我心中想到的另一个(疯狂)想法。请注意,与我之前的答案类似,它并不能保证更好的性能(事实上它可能更糟)。它只是提供了一种方法,可以通过单个SQL查询来执行您所要求的操作。

Here we are going to create a query that returns a single string with length N consisting of '0' and '1' characters with '1' denoting a match (something like string bit array). The query will use my favorite group by constant technique to build dynamically something like this:

这里我们将创建一个返回单个字符串的查询,其长度为N,由'0'和'1'字符组成,'1'表示匹配(类似于字符串位数组)。查询将通过常量技术使用我最喜欢的组来动态构建这样的东西:

var matchInfo = queryable
    .GroupBy(e => 1)
    .Select(g =>
        (g.Max(Condition[0] ? "1" : "0")) +
        (g.Max(Condition[1] ? "1" : "0")) +
            ...
        (g.Max(Condition[N-1] ? "1" : "0")))
    .FirstOrDefault() ?? "";

And here is the code:

这是代码:

var group = Expression.Parameter(typeof(IGrouping<int, TEntity>), "g");

var concatArgs = providers.Select(provider => Expression.Call(
        typeof(Enumerable), "Max", new[] { typeof(TEntity), typeof(string) },
        group, Expression.Lambda(
            Expression.Condition(
                provider.Condition.Body, Expression.Constant("1"), Expression.Constant("0")),
            provider.Condition.Parameters)));

var concatCall = Expression.Call(
    typeof(string).GetMethod("Concat", new[] { typeof(string[]) }),
    Expression.NewArrayInit(typeof(string), concatArgs));

var selector = Expression.Lambda<Func<IGrouping<int, TEntity>, string>>(concatCall, group);

var matchInfo = queryable
    .GroupBy(e => 1)
    .Select(selector)
    .FirstOrDefault() ?? "";

var matchingProviders = matchInfo.Zip(providers,
    (match, provider) => match == '1' ? provider : null)
    .Where(provider => provider != null)
    .ToList();

Enjoy:)

请享用:)

P.S. In my opinion, this query will run with constant speed (regarding number and type of the conditions, i.e. can be considered O(N) in the best, worst and average cases, where N is the number of the records in the table) because the database has to perform always a full table scan. Still it will be interesting to know what's the actual performance, but most likely doing something like this just doesn't worth the efforts.

附:在我看来,这个查询将以恒定的速度运行(关于条件的数量和类型,即在最佳,最差和平均情况下可以被认为是O(N),其中N是表中记录的数量),因为数据库必须始终执行全表扫描。知道什么是实际表现仍然很有趣,但最有可能做这样的事情并不值得努力。

Update: Regarding the bounty and the updated requirement:

更新:关于赏金和更新的要求:

Find a fast query that only reads a record of the table once and ends the query if already all conditions are met

查找只读取表记录一次的快速查询,如果已满足所有条件,则结束查询

There is no standard SQL construct (not even speaking about LINQ query translation) that satisfies both conditions. The constructs that allow early end like EXISTS can be used for a single condition, thus when executed for multiple conditions will violate the first rule of reading the table record only once. While the constructs that use aggregates like in this answer satisfy the first rule, but in order to produce the aggregate value they have to read all the records, thus cannot exit earlier.

没有标准的SQL构造(甚至不谈论LINQ查询转换)满足这两个条件。允许像EXISTS这样的早期结束的构造可以用于单个条件,因此当针对多个条件执行时,将违反仅读取表记录的第一个规则。虽然在这个答案中使用聚合的构造满足第一个规则,但是为了产生聚合值,它们必须读取所有记录,因此不能提前退出。

Shortly, there is no query that can satisfy both requirements. What about the fast part, it really depends of the size of the data and the number and type of the conditions, table indexes etc., so again there is simply no "best" general solution for all cases.

很快,没有任何查询可以满足这两个要求。快速部分如何,它实际上取决于数据的大小以及条件的数量和类型,表索引等,因此对于所有情况,根本没有“最佳”通用解决方案。

#3


3  

Based on this Post by @Ivan I created an expression that is slightly faster in some cases.

基于@Ivan的这篇文章,我创建了一个在某些情况下稍快的表达式。

It uses Any instead of Max to get the desired results.

它使用Any而不是Max来获得所需的结果。

var group = Expression.Parameter(typeof(IGrouping<int, TEntity>), "g");

var anyMethod = typeof(Enumerable)
    .GetMethods()
    .First(m => m.Name == "Any" && m.GetParameters()
    .Count() == 2)
    .MakeGenericMethod(typeof(TEntity));

var concatArgs = Providers.Select(provider => 
    Expression.Call(anyMethod, group, 
    Expression.Lambda(provider.Condition.Body, provider.Condition.Parameters)));

var convertExpression = concatArgs.Select(concat =>
    Expression.Condition(concat, Expression.Constant("1"), Expression.Constant("0")));

var concatCall = Expression.Call(
    typeof(string).GetMethod("Concat", new[] { typeof(string[]) }),
    Expression.NewArrayInit(typeof(string), convertExpression));

var selector = Expression.Lambda<Func<IGrouping<int, TEntity>, string>>(concatCall, group);

var matchInfo = queryable
    .GroupBy(e => 1)
    .Select(selector)
    .First();

var MatchingProviders = matchInfo.Zip(Providers,
    (match, provider) => match == '1' ? provider : null)
    .Where(provider => provider != null)
    .ToList();

#4


1  

The approach I tried here was to create Conditions and nest them into one Expression. If one of the Conditions is met, we get the index of the Provider for it.

我在这里尝试的方法是创建条件并将它们嵌套到一个表达式中。如果满足其中一个条件,我们将获得Provider的索引。

private static Expression NestedExpression(
    IEnumerable<Expression<Func<TEntity, bool>>> expressions, 
    int startIndex = 0)
{
    var range = expressions.ToList();
    range.RemoveRange(0, startIndex);

    if (range.Count == 0)
        return Expression.Constant(-1);

    return Expression.Condition(
        range[0].Body, 
        Expression.Constant(startIndex), 
        NestedExpression(expressions, ++startIndex));
}

Because the Expressions still may use different ParameterExpressions, we need an ExpressionVisitor to rewrite those:

因为表达式仍然可以使用不同的ParameterExpressions,我们需要一个ExpressionVisitor来重写它们:

private class PredicateRewriterVisitor : ExpressionVisitor
{
    private readonly ParameterExpression _parameterExpression;

    public PredicateRewriterVisitor(ParameterExpression parameterExpression)
    {
        _parameterExpression = parameterExpression;
    }

    protected override Expression VisitParameter(ParameterExpression node)
    {
        return _parameterExpression;
    }
}

For the rewrite we only need to call this method:

对于重写,我们只需要调用此方法:

private static Expression<Func<T, bool>> Rewrite<T>(
    Expression<Func<T, bool>> exp, 
    ParameterExpression parameterExpression)
{
    var newExpression = new PredicateRewriterVisitor(parameterExpression).Visit(exp);
    return (Expression<Func<T, bool>>)newExpression;
}

The query itself and the selection of the Provider instances works like this:

查询本身和Provider实例的选择如下:

var parameterExpression = Expression.Parameter(typeof(TEntity), "src");
var conditions = Providers.Select(provider => 
    Rewrite(provider.Condition, parameterExpression)
);

var nestedExpression = NestedExpression(conditions);
var lambda = Expression.Lambda<Func<TEntity, int>>(nestedExpression, parameterExpression);

var matchInfo = queryable.Select(lambda).Distinct();
var MatchingProviders = Providers.Where((provider, index) => matchInfo.Contains(index));

Note: Another option which isn't really fast as well

注意:另一个选项也不是很快

#5


1  

Here is another view of the problem that has nothing to do with expressions.

以下是与表达式无关的问题的另一种观点。

Since the main goal is to improve the performance, if the attempts to produce the result with single query don't help, we could try improving the speed by parallelizing the execution of the original multi query solution.

由于主要目标是提高性能,如果尝试使用单个查询生成结果没有帮助,我们可以尝试通过并行执行原始多查询解决方案来提高速度。

Since it's really a LINQ to Objects query (which internally executes multiple EF queries), theoretically it should be a simple matter of turning it into a PLINQ query by inserting AsParallel like this (non working):

因为它实际上是一个LINQ to Objects查询(在内部执行多个EF查询),理论上它应该是一个简单的问题,通过像这样插入AsParallel(非工作)将其转换为PLINQ查询:

var matchingProviders = providers
    .AsParallel()
    .Where(provider => queryable.Any(provider.Condition))
    .ToList();

However, it turns out that EF DbContext is not well suited for multi thread access, and the above simply generates runtime errors. So I had to resort to TPL using one of the Parallel.ForEach overloads that allows us to supply local state, which I used to allocate several DbContext instances during the execution.

但是,事实证明EF DbContext不适合多线程访问,而上述只是生成运行时错误。所以我不得不使用其中一个Parallel.ForEach重载来使用TPL,它允许我们提供本地状态,我在执行期间用它来分配几个DbContext实例。

The final working code looks like this:

最终的工作代码如下所示:

var matchingProviders = new List<Provider<TEntity>>();
Parallel.ForEach(providers,
    () => new
    {
        context = new MyDbContext(),
        matchingProviders = new List<Provider<TEntity>>()
    },
    (provider, state, data) =>
    {
        if (data.context.Set<TEntity>().Any(provider.Condition))
            data.matchingProviders.Add(provider);
        return data;
    },
    data =>
    {
        data.context.Dispose();
        if (data.matchingProviders.Count > 0)
        {
            lock (matchingProviders)
                matchingProviders.AddRange(data.matchingProviders);
        }
    }
);

If you have a multi core CPU (which is normal nowadays) and a good database server, this should give you the improvement you are seeking for.

如果你有一个多核CPU(现在很正常)和一个好的数据库服务器,这应该会为你提供所需的改进。