如何按C#中的搜索条件过滤后按匹配数量排序列表?

时间:2022-09-13 08:35:30

I have a Filter Method in my User Class, that takes in a list of Users and a string of search terms. Currently, the FindAll predicate splits the terms on spaces then returns a match if any of the searchable properties contain any part of the terms.

我的用户类中有一个过滤方法,它包含用户列表和一串搜索词。目前,FindAll谓词在空格上拆分术语,然后如果任何可搜索属性包含术语的任何部分,则返回匹配。

public static List<User> FilterBySearchTerms( List<User> usersToFilter, string searchTerms, bool searchEmailText )
{
    return usersToFilter.FindAll( user =>
    {
        // Convert to lower case for better comparison, trim white space and then split on spaces to search for all terms
        string[] terms = searchTerms.ToLower().Trim().Split( ' ' );

        foreach ( string term in terms )
        {
            // TODO: Is this any quicker than two separate ifs?
            if ( 
                    (searchEmailText && user.Email.ToLower().Contains( term )) 
                    || (
                        user.FirstName.ToLower().Contains( term ) || user.Surname.ToLower().Contains( term ) 
                        || user.Position.ToLower().Contains( term ) || user.Company.ToLower().Contains( term ) 
                        || user.Office.ToLower().Contains( term ) 
                        || user.Title.ToLower().Contains( term )
                    )
            )
                return true;
            // Search UserID by encoded UserInviteID
            else 
            {
                int encodedID;
                if ( int.TryParse( term, out encodedID ) )
                {
                    User fromInvite = GetByEncodedUserInviteID( encodedID );
                    if ( fromInvite != null && fromInvite.ID.HasValue && fromInvite.ID.Value == user.ID )
                        return true;
                }
            }
        }

        return false;
    } );
}

I have received a new requirement so that the ordering is now important. For example, a search for 'Mr Smith' should have Mr Adam Smith ahead of Mrs Eve Smith, which might make my use of Contains inappropriate. However, the most important thing is the number of property/part of term matches.

我收到了一项新要求,因此订购现在非常重要。例如,搜索“史密斯先生”应该让亚当史密斯先生领先于史密斯夫人,这可能会使我使用包含不合适。但是,最重要的是术语匹配的属性/部分数量。

I'm thinking I could have a couple of counters to keep track of complete term matches and partial matches, then order by those two. I'm also open to suggestions on how the Filter method can be improved - perhaps using something else entirely.

我想我可以有几个计数器来跟踪完整的术语匹配和部分匹配,然后按这两个排序。我也欢迎提出如何改进Filter方法的建议 - 也许完全使用其他方法。

3 个解决方案

#1


Here's a LINQ-based solution. It will be a bit more of a pain if you're not using .NET 3.5, I'm afraid. It separates out the details of the matching from the query itself, for clarity.

这是一个基于LINQ的解决方案。如果你不使用.NET 3.5,那将会更加痛苦,我担心。为清楚起见,它从查询本身中分离出匹配的细节。

You'll need to create a LowerCaseUser method which returns a User object with all the properties lower-cased - it makes more sense to do that once than for every search term. If you can put that and UserMatches into the User class, so much the better. Anyway, here's the code.

您需要创建一个LowerCaseUser方法,该方法返回一个User对象,其中包含所有属性较低的属性 - 执行该操作比使用每个搜索项更有意义。如果你可以把它和UserMatches放到User类中,那就更好了。无论如何,这是代码。

public static List<User> FilterBySearchTerms
    (List<User> usersToFilter, 
     string searchTerms,
     bool searchEmailText)
{
    // Just split the search terms once, rather than for each user
    string[] terms = searchTerms.ToLower().Trim().Split(' ');

    return (from user in usersToFilter
            let lowerUser = LowerCaseUser(user)
            let matchCount = terms.Count(term => 
                                         UserMatches(lowerUser, term))
            where matchCount != 0
            orderby matchCount descending
            select user).ToList();
}

private static bool UserMatches(User user, string term,
                                bool searchEmailText)
{
    if ((searchEmailText && user.Email.Contains(term))
        || user.FirstName.Contains(term)
        || user.Surname.Contains(term)
        || user.Position.Contains(term)
        || user.Company.Contains(term)
        || user.Office.Contains(term)
        || user.Title.Contains(term))
    {
        return true;
    }
    int encodedID;
    if (int.TryParse(term, out encodedID))
    {
        User fromInvite = GetByEncodedUserInviteID(encodedID);
        // Let the compiler handle the null/non-null comparison
        if (fromInvite != null && fromInvite.ID == user.ID)
        {
            return true;
        }
    }
    return false;
}

#2


The first thing, I'd say that you need to do, is to break the large lazily evaluated or conditions into separate conditions. Otherwise you're never going to resolve how many matches you're actually getting. After this you'll probably need a score for each user which reflects how well the search terms match it.

首先,我要说你需要做的是,将大型懒惰评估或条件分解为单独的条件。否则你永远不会解决你实际得到的匹配数。在此之后,您可能需要为每个用户分数,以反映搜索词与其匹配的程度。

I'm also assuming you are able to use LINQ here as you are already using lambda expressions.

我也假设您可以在这里使用LINQ,因为您已经在使用lambda表达式。

    class ScoredUser
    {
        public User User { get; set; }
        public int Score { get; set; }
    }

    public static List<User> FilterBySearchTerms(List<User> usersToFilter, string searchTerms, bool searchEmailText)
    {
        // Convert to lower case for better comparison, trim white space and then split on spaces to search for all terms
        string[] terms = searchTerms.ToLower().Trim().Split(' ');

        // Run a select statement to user list which converts them to
        // a scored object.
        return usersToFilter.Select(user =>
        {
            ScoredUser scoredUser = new ScoredUser()
            {
                User = user,
                Score = 0
            };

            foreach (string term in terms)
            {
                if (searchEmailText && user.Email.ToLower().Contains(term))
                    scoredUser.Score++;

                if (user.FirstName.ToLower().Contains(term))
                    scoredUser.Score++;

                if (user.Surname.ToLower().Contains(term))
                    scoredUser.Score++;

                // etc.
            }

            return scoredUser;

            // Select all scored users with score greater than 0, order by score and select the users.
        }).Where(su => su.Score > 0).OrderByDescending(su => su.Score).Select(su => su.User).ToList();
    }

Having the method return a scored customer also allows you easily tweak the scoring balances later on. Say you want a matching first name matter more than a matching company for example.

让该方法返回得分客户还可以让您稍后轻松调整得分余额。假设您希望匹配的名字比匹配的公司更重要。

#3


Is the distinction between full and partial matching what's relevant, or just standard lexicographical sorting? If you were to sort Mr. Adam Smith and Mrs. Eve Smith they would be placed in that order. That would just allow you to use a standard ordering lambda.

完全匹配和部分匹配是区分相关的,还是标准的词典排序?如果你要对亚当·史密斯先生和夏娃史密斯夫人进行排序,他们将按照这个顺序排列。这只会允许你使用标准的排序lambda。

#1


Here's a LINQ-based solution. It will be a bit more of a pain if you're not using .NET 3.5, I'm afraid. It separates out the details of the matching from the query itself, for clarity.

这是一个基于LINQ的解决方案。如果你不使用.NET 3.5,那将会更加痛苦,我担心。为清楚起见,它从查询本身中分离出匹配的细节。

You'll need to create a LowerCaseUser method which returns a User object with all the properties lower-cased - it makes more sense to do that once than for every search term. If you can put that and UserMatches into the User class, so much the better. Anyway, here's the code.

您需要创建一个LowerCaseUser方法,该方法返回一个User对象,其中包含所有属性较低的属性 - 执行该操作比使用每个搜索项更有意义。如果你可以把它和UserMatches放到User类中,那就更好了。无论如何,这是代码。

public static List<User> FilterBySearchTerms
    (List<User> usersToFilter, 
     string searchTerms,
     bool searchEmailText)
{
    // Just split the search terms once, rather than for each user
    string[] terms = searchTerms.ToLower().Trim().Split(' ');

    return (from user in usersToFilter
            let lowerUser = LowerCaseUser(user)
            let matchCount = terms.Count(term => 
                                         UserMatches(lowerUser, term))
            where matchCount != 0
            orderby matchCount descending
            select user).ToList();
}

private static bool UserMatches(User user, string term,
                                bool searchEmailText)
{
    if ((searchEmailText && user.Email.Contains(term))
        || user.FirstName.Contains(term)
        || user.Surname.Contains(term)
        || user.Position.Contains(term)
        || user.Company.Contains(term)
        || user.Office.Contains(term)
        || user.Title.Contains(term))
    {
        return true;
    }
    int encodedID;
    if (int.TryParse(term, out encodedID))
    {
        User fromInvite = GetByEncodedUserInviteID(encodedID);
        // Let the compiler handle the null/non-null comparison
        if (fromInvite != null && fromInvite.ID == user.ID)
        {
            return true;
        }
    }
    return false;
}

#2


The first thing, I'd say that you need to do, is to break the large lazily evaluated or conditions into separate conditions. Otherwise you're never going to resolve how many matches you're actually getting. After this you'll probably need a score for each user which reflects how well the search terms match it.

首先,我要说你需要做的是,将大型懒惰评估或条件分解为单独的条件。否则你永远不会解决你实际得到的匹配数。在此之后,您可能需要为每个用户分数,以反映搜索词与其匹配的程度。

I'm also assuming you are able to use LINQ here as you are already using lambda expressions.

我也假设您可以在这里使用LINQ,因为您已经在使用lambda表达式。

    class ScoredUser
    {
        public User User { get; set; }
        public int Score { get; set; }
    }

    public static List<User> FilterBySearchTerms(List<User> usersToFilter, string searchTerms, bool searchEmailText)
    {
        // Convert to lower case for better comparison, trim white space and then split on spaces to search for all terms
        string[] terms = searchTerms.ToLower().Trim().Split(' ');

        // Run a select statement to user list which converts them to
        // a scored object.
        return usersToFilter.Select(user =>
        {
            ScoredUser scoredUser = new ScoredUser()
            {
                User = user,
                Score = 0
            };

            foreach (string term in terms)
            {
                if (searchEmailText && user.Email.ToLower().Contains(term))
                    scoredUser.Score++;

                if (user.FirstName.ToLower().Contains(term))
                    scoredUser.Score++;

                if (user.Surname.ToLower().Contains(term))
                    scoredUser.Score++;

                // etc.
            }

            return scoredUser;

            // Select all scored users with score greater than 0, order by score and select the users.
        }).Where(su => su.Score > 0).OrderByDescending(su => su.Score).Select(su => su.User).ToList();
    }

Having the method return a scored customer also allows you easily tweak the scoring balances later on. Say you want a matching first name matter more than a matching company for example.

让该方法返回得分客户还可以让您稍后轻松调整得分余额。假设您希望匹配的名字比匹配的公司更重要。

#3


Is the distinction between full and partial matching what's relevant, or just standard lexicographical sorting? If you were to sort Mr. Adam Smith and Mrs. Eve Smith they would be placed in that order. That would just allow you to use a standard ordering lambda.

完全匹配和部分匹配是区分相关的,还是标准的词典排序?如果你要对亚当·史密斯先生和夏娃史密斯夫人进行排序,他们将按照这个顺序排列。这只会允许你使用标准的排序lambda。