优化Cocoa / Objective-C搜索

时间:2022-09-07 20:08:49

I'm performing a search of a large plist file which contains dictionaries, tens of thousands of them, each with 2 key/string pairs. My search algorithms goes through the dictionaries, and when it finds a text match in either of the strings in the dictionary, the contents of the dictionary are inserted. Here is how it works:

我正在搜索一个包含字典的大型plist文件,其中有数以万计,每个都有2个键/字符串对。我的搜索算法遍历字典,当它在字典中的任何一个字符串中找到文本匹配时,将插入字典的内容。下面是它的工作原理:

NSDictionary *eachEntry;
NSArray *rawGlossaryArray = [[NSArray alloc] initWithContentsOfFile:thePath]; // this contains the contents of the plist

for (eachEntry in rawGlossaryArray)
    {
        GlossaryEntry *anEntry = [[GlossaryEntry alloc] initWithDictionary:eachEntry];


        NSRange titleResultsRange = [anEntry.title rangeOfString:filterString options:NSCaseInsensitiveSearch];
        NSRange defResultsRange = [anEntry.definition rangeOfString:filterString options:NSCaseInsensitiveSearch];

        if (titleResultsRange.length > 0 || defResultsRange.length > 0) {
            // store that item in the glossary dictionary with the name as the key
            [glossaryDictionary setObject:anEntry forKey:anEntry.title];

        }
        [anEntry release];
    }

Each time a search is performed, there is a delay of around 3-4 seconds in my iPhone app (on the device at least; everything runs pretty quickly in the simulator). Can anyone advise on how I might optimize this search?

每次执行搜索时,我的iPhone应用程序都会延迟大约3-4秒(至少在设备上;在模拟器中一切都运行得很快)。任何人都可以建议我如何优化此搜索?

7 个解决方案

#1


2  

Without looking at the data set I can't be sure, but if you profile it you are spending the vast percentage of your time in -rangeOfString:options:. If that is the case you will not be able to improve performance without fundamentally changing the data structure you are using to store your data.

没有看数据集我不能确定,但​​是如果你对它进行分析,你将花费大量的时间在-rangeOfString:options:。如果是这种情况,您将无法在不从根本上更改用于存储数据的数据结构的情况下提高性能。

You might want to construct some sort trie with strings and substrings pointing to the objects. It is much more complicated thing to setup, and insertions into it will be more expensive, but lookup would be very fast. Given that you are serializing out the structure anyway expensive inserts should not be much of an issue.

您可能希望构造一些带有指向对象的字符串和子字符串的排序trie。设置要复杂得多,插入它会更加昂贵,但查找速度会非常快。鉴于您正在序列化结构,无论如何昂贵的插入不应该是一个问题。

#2


2  

That just cries out for using a database, that you pre-populate and put into the application.

这只是急于使用数据库,你预先填充并放入应用程序。

#3


1  

A few suggestions:

一些建议:

  1. You're doing a lot of allocing and releasing in that loop. Could you create a single GlossaryEntry before the loop, then just reload it's contents inside the loop? This would avoid a bunch of alloc/releases.

    你在那个循环中做了很多分配和释放。你能在循环之前创建一个GlossaryEntry,然后只是在循环中重新加载它的内容吗?这样可以避免大量的alloc / release。

  2. Rather than loading the file each time, could you lazy load it once and keep it cached in memory (maybe in a singleton type object)? Generally this isn't a good idea on the iPhone, but you could have some code in your "didReceiveMemoryWarning" handler that would free the cache if it became an issue.

    你可以懒得加载一次并将其缓存在内存中(可能是单例类型的对象),而不是每次都加载文件吗?通常这在iPhone上不是一个好主意,但你可以在你的“didReceiveMemoryWarning”处理程序中有一些代码,如果它成为问题就会释放缓存。

#4


1  

You should run your application is Instruments, and see what the bottleneck really is. Performance optimizations in the blind are really difficult, and we have tools to make them clear, and the tools are good too!

你应该运行你的应用程序是Instruments,看看瓶颈究竟是什么。盲人中的性能优化确实很困难,我们有工具可以使它们清晰,工具也很好!

There's also the possibility that this isn't optimizable. I'm not sure if it's actually hanging the UI in your app or just taking a long time. If it's blocking the UI you need to get out of the main thread to do this work. Same with any significant work to keep an app responsive.

还有可能这是不可优化的。我不确定它是否真的在你的应用程序中挂起UI或者只是花了很长时间。如果它阻止了UI,你需要离开主线程来完成这项工作。与保持应用响应的任何重要工作相同。

#5


1  

try the following, and see if you get any improvement:

尝试以下方法,看看你是否有任何改进:

1) use

- (NSRange)rangeOfString:(NSString *)aString options:(NSStringCompareOptions)mask

and as mask, pass the value NSLiteralSearch. This may speedup search considerably as described in the Apple documentation (String Programming Guide for Cocoa):

并作为掩码,传递值NSLiteralSearch。这可能会大大加速搜索,如Apple文档(Cocoa的字符串编程指南)中所述:

NSLiteralSearch Performs a byte-for-byte comparison. Differing literal sequences (such as composed character sequences) that would otherwise be considered equivalent are considered not to match. Using this option can speed some operations dramatically.

NSLiteralSearch执行逐字节比较。不同的文字序列(例如组合的字符序列)被认为是等同的被认为是不匹配的。使用此选项可以显着加快某些操作。

2) From the documentation (String Programming Guide for Cocoa):

2)从文档(Cocoa的字符串编程指南):

If you simply want to determine whether a string contains a given pattern, you can use a predicate:

如果您只想确定字符串是否包含给定模式,则可以使用谓词:

BOOL match = [myPredicate evaluateWithObject:myString];

For more about predicates, see Predicate Programming Guide.

有关谓词的更多信息,请参阅谓词编程指南。

#6


1  

You're probably getting the best performance you're likely to get, given your current data structures. You need to change how you're accessing the data, in order to get better performance.

鉴于您当前的数据结构,您可能获得了可能获得的最佳性能。您需要更改访问数据的方式,以获得更好的性能。

Suggestions, in no particular order:

建议,没有特别的顺序:

  1. Don't create your GlossaryEntry objects in a loop while you're filtering them. Rather than storing the data in a Property List, just archive your array of GlossaryEntry objects. See the NSCoding documentation.

    在过滤它们时,不要在循环中创建GlossaryEntry对象。而不是将数据存储在属性列表中,只需归档您的GlossaryEntry对象数组。请参阅NSCoding文档。

  2. Rather than searching through tens of thousands of strings at every keystroke, generate an index of common substrings (maybe 2 or 3 letters), and create an NSDictionary that maps from that common substring to the set of results to use as an index. You can create the index at build time, rather than at run-time. If you can slice up your data set into several smaller pieces, the linear search for matching strings will be considerably faster.

    不是在每次击键时搜索成千上万的字符串,而是生成公共子串的索引(可能是2或3个字母),并创建一个从该公共子串映射到用作索引的结果集的NSDictionary。您可以在构建时创建索引,而不是在运行时创建索引。如果您可以将数据集分割成几个较小的片段,则匹配字符串的线性搜索将会快得多。

  3. Store your data in an SQLite database, and use SQL to query it - probably overkill for just this problem, but allows for more sophisticated searches in the future, if you'll need them.

    将您的数据存储在SQLite数据库中,并使用SQL查询它 - 可能只是针对这个问题,但如果您需要,可以在将来进行更复杂的搜索。

  4. If creating a simple index doesn't work well enough, you'll need to create a search tree style data structure.

    如果创建简单索引不能很好地工作,则需要创建搜索树样式数据结构。

#7


0  

You should profile it in instruments to find where the bottleneck actually is. If I had to guess, I would say the bottleneck would be [[NSArray alloc] initWithContentsOfFile:thePath].

您应该在仪器中对其进行分析,以找出瓶颈的实际位置。如果我不得不猜测,我会说瓶颈是[[NSArray alloc] initWithContentsOfFile:thePath]。

Having said that, you'd probably get the best performance by storing the data in an sqlite database (which you would search with SQL) instead of using a plist.

话虽如此,您可能通过将数据存储在sqlite数据库(您将使用SQL搜索)而不是使用plist来获得最佳性能。

#1


2  

Without looking at the data set I can't be sure, but if you profile it you are spending the vast percentage of your time in -rangeOfString:options:. If that is the case you will not be able to improve performance without fundamentally changing the data structure you are using to store your data.

没有看数据集我不能确定,但​​是如果你对它进行分析,你将花费大量的时间在-rangeOfString:options:。如果是这种情况,您将无法在不从根本上更改用于存储数据的数据结构的情况下提高性能。

You might want to construct some sort trie with strings and substrings pointing to the objects. It is much more complicated thing to setup, and insertions into it will be more expensive, but lookup would be very fast. Given that you are serializing out the structure anyway expensive inserts should not be much of an issue.

您可能希望构造一些带有指向对象的字符串和子字符串的排序trie。设置要复杂得多,插入它会更加昂贵,但查找速度会非常快。鉴于您正在序列化结构,无论如何昂贵的插入不应该是一个问题。

#2


2  

That just cries out for using a database, that you pre-populate and put into the application.

这只是急于使用数据库,你预先填充并放入应用程序。

#3


1  

A few suggestions:

一些建议:

  1. You're doing a lot of allocing and releasing in that loop. Could you create a single GlossaryEntry before the loop, then just reload it's contents inside the loop? This would avoid a bunch of alloc/releases.

    你在那个循环中做了很多分配和释放。你能在循环之前创建一个GlossaryEntry,然后只是在循环中重新加载它的内容吗?这样可以避免大量的alloc / release。

  2. Rather than loading the file each time, could you lazy load it once and keep it cached in memory (maybe in a singleton type object)? Generally this isn't a good idea on the iPhone, but you could have some code in your "didReceiveMemoryWarning" handler that would free the cache if it became an issue.

    你可以懒得加载一次并将其缓存在内存中(可能是单例类型的对象),而不是每次都加载文件吗?通常这在iPhone上不是一个好主意,但你可以在你的“didReceiveMemoryWarning”处理程序中有一些代码,如果它成为问题就会释放缓存。

#4


1  

You should run your application is Instruments, and see what the bottleneck really is. Performance optimizations in the blind are really difficult, and we have tools to make them clear, and the tools are good too!

你应该运行你的应用程序是Instruments,看看瓶颈究竟是什么。盲人中的性能优化确实很困难,我们有工具可以使它们清晰,工具也很好!

There's also the possibility that this isn't optimizable. I'm not sure if it's actually hanging the UI in your app or just taking a long time. If it's blocking the UI you need to get out of the main thread to do this work. Same with any significant work to keep an app responsive.

还有可能这是不可优化的。我不确定它是否真的在你的应用程序中挂起UI或者只是花了很长时间。如果它阻止了UI,你需要离开主线程来完成这项工作。与保持应用响应的任何重要工作相同。

#5


1  

try the following, and see if you get any improvement:

尝试以下方法,看看你是否有任何改进:

1) use

- (NSRange)rangeOfString:(NSString *)aString options:(NSStringCompareOptions)mask

and as mask, pass the value NSLiteralSearch. This may speedup search considerably as described in the Apple documentation (String Programming Guide for Cocoa):

并作为掩码,传递值NSLiteralSearch。这可能会大大加速搜索,如Apple文档(Cocoa的字符串编程指南)中所述:

NSLiteralSearch Performs a byte-for-byte comparison. Differing literal sequences (such as composed character sequences) that would otherwise be considered equivalent are considered not to match. Using this option can speed some operations dramatically.

NSLiteralSearch执行逐字节比较。不同的文字序列(例如组合的字符序列)被认为是等同的被认为是不匹配的。使用此选项可以显着加快某些操作。

2) From the documentation (String Programming Guide for Cocoa):

2)从文档(Cocoa的字符串编程指南):

If you simply want to determine whether a string contains a given pattern, you can use a predicate:

如果您只想确定字符串是否包含给定模式,则可以使用谓词:

BOOL match = [myPredicate evaluateWithObject:myString];

For more about predicates, see Predicate Programming Guide.

有关谓词的更多信息,请参阅谓词编程指南。

#6


1  

You're probably getting the best performance you're likely to get, given your current data structures. You need to change how you're accessing the data, in order to get better performance.

鉴于您当前的数据结构,您可能获得了可能获得的最佳性能。您需要更改访问数据的方式,以获得更好的性能。

Suggestions, in no particular order:

建议,没有特别的顺序:

  1. Don't create your GlossaryEntry objects in a loop while you're filtering them. Rather than storing the data in a Property List, just archive your array of GlossaryEntry objects. See the NSCoding documentation.

    在过滤它们时,不要在循环中创建GlossaryEntry对象。而不是将数据存储在属性列表中,只需归档您的GlossaryEntry对象数组。请参阅NSCoding文档。

  2. Rather than searching through tens of thousands of strings at every keystroke, generate an index of common substrings (maybe 2 or 3 letters), and create an NSDictionary that maps from that common substring to the set of results to use as an index. You can create the index at build time, rather than at run-time. If you can slice up your data set into several smaller pieces, the linear search for matching strings will be considerably faster.

    不是在每次击键时搜索成千上万的字符串,而是生成公共子串的索引(可能是2或3个字母),并创建一个从该公共子串映射到用作索引的结果集的NSDictionary。您可以在构建时创建索引,而不是在运行时创建索引。如果您可以将数据集分割成几个较小的片段,则匹配字符串的线性搜索将会快得多。

  3. Store your data in an SQLite database, and use SQL to query it - probably overkill for just this problem, but allows for more sophisticated searches in the future, if you'll need them.

    将您的数据存储在SQLite数据库中,并使用SQL查询它 - 可能只是针对这个问题,但如果您需要,可以在将来进行更复杂的搜索。

  4. If creating a simple index doesn't work well enough, you'll need to create a search tree style data structure.

    如果创建简单索引不能很好地工作,则需要创建搜索树样式数据结构。

#7


0  

You should profile it in instruments to find where the bottleneck actually is. If I had to guess, I would say the bottleneck would be [[NSArray alloc] initWithContentsOfFile:thePath].

您应该在仪器中对其进行分析,以找出瓶颈的实际位置。如果我不得不猜测,我会说瓶颈是[[NSArray alloc] initWithContentsOfFile:thePath]。

Having said that, you'd probably get the best performance by storing the data in an sqlite database (which you would search with SQL) instead of using a plist.

话虽如此,您可能通过将数据存储在sqlite数据库(您将使用SQL搜索)而不是使用plist来获得最佳性能。