在二进制数据中查找字符串

时间:2022-06-12 18:04:29

I have a binary file I've loaded using an NSData object. Is there a way to locate a sequence of characters, 'abcd' for example, within that binary data and return the offset without converting the entire file to a string? Seems like it should be a simple answer, but I'm not sure how to do it. Any ideas?

我有一个二进制文件,我使用NSData对象加载。有没有办法在二进制数据中找到一个字符序列,例如'abcd',并返回偏移量而不将整个文件转换为字符串?似乎它应该是一个简单的答案,但我不知道该怎么做。有任何想法吗?

I'm doing this on iOS 3 so I don't have -rangeOfData:options:range: available.

我在iOS 3上这样做,所以我没有-rangeOfData:options:range:available。

I'm going to award this one to Sixteen Otto for suggesting strstr. I went and found the source code for the C function strstr and rewrote it to work on a fixed length Byte array--which incidentally is different from a char array as it is not null terminated. Here is the code I ended up with:

我打算把这个奖励给十六奥托以表示strstr。我去找了C函数strstr的源代码并重写它以在固定长度的Byte数组上工作 - 顺便说一句,它与char数组不同,因为它不是null终止的。这是我最终得到的代码:

- (Byte*)offsetOfBytes:(Byte*)bytes inBuffer:(const Byte*)buffer ofLength:(int)len;
{
    Byte *cp = bytes;
    Byte *s1, *s2;

    if ( !*buffer )
        return bytes;

    int i = 0;
    for (i=0; i < len; ++i)
    {
        s1 = cp;
        s2 = (Byte*)buffer;

        while ( *s1 && *s2 && !(*s1-*s2) )
            s1++, s2++;

        if (!*s2)
            return cp;

        cp++;
    }

    return NULL;
}

This returns a pointer to the first occurrence of bytes, the thing I'm looking for, in buffer, the byte array that should contain bytes.

这将返回一个指针,指向第一次出现的字节,我正在寻找的东西,在缓冲区中,应该包含字节的字节数组。

I call it like this:

我称之为:

// data is the NSData object
const Byte *bytes = [data bytes];
Byte* index = [self offsetOfBytes:tag inBuffer:bytes ofLength:[data length]];

3 个解决方案

#1


14  

Convert your substring to an NSData object, and search for those bytes in the larger NSData using rangeOfData:options:range:. Make sure that the string encodings match!

将您的子字符串转换为NSData对象,并使用rangeOfData:options:range:在较大的NSData中搜索这些字节。确保字符串编码匹配!

On iPhone, where that isn't available, you may have to do this yourself. The C function strstr() will give you a pointer to the first occurrence of a pattern within the buffer (as long as neither contain nulls!), but not the index. Here's a function that should do the job (but no promises, since I haven't tried actually running it...):

在iPhone上,如果没有,你可能必须自己做。 C函数strstr()将为您提供指向缓冲区中第一次出现模式的指针(只要它们都不包含空值!),而不是索引。这是一个应该完成工作的功能(但没有承诺,因为我没有尝试过实际运行它......):

- (NSUInteger)indexOfData:(NSData*)needle inData:(NSData*)haystack
{
    const void* needleBytes = [needle bytes];
    const void* haystackBytes = [haystack bytes];

    // walk the length of the buffer, looking for a byte that matches the start
    // of the pattern; we can skip (|needle|-1) bytes at the end, since we can't
    // have a match that's shorter than needle itself
    for (NSUInteger i=0; i < [haystack length]-[needle length]+1; i++)
    {
        // walk needle's bytes while they still match the bytes of haystack
        // starting at i; if we walk off the end of needle, we found a match
        NSUInteger j=0;
        while (j < [needle length] && needleBytes[j] == haystackBytes[i+j])
        {
            j++;
        }
        if (j == [needle length])
        {
            return i;
        }
    }
    return NSNotFound;
}

This runs in something like O(nm), where n is the buffer length, and m is the size of the substring. It's written to work with NSData for two reasons: 1) that's what you seem to have in hand, and 2) those objects already encapsulate both the actual bytes, and the length of the buffer.

这类似于O(nm),其中n是缓冲区长度,m是子串的大小。它被编写为与NSData一起工作有两个原因:1)这就是你似乎掌握的东西,2)这些对象已经封装了实际的字节和缓冲区的长度。

#2


1  

If you're using Snow Leopard, a convenient way is the new -rangeOfData:options:range: method in NSData that returns the range of the first occurrence of a piece of data. Otherwise, you can access the NSData's contents yourself using its -bytes method to perform your own search.

如果您正在使用Snow Leopard,一种方便的方法是NSData中的new -rangeOfData:options:range:方法,它返回第一次出现的数据的范围。否则,您可以使用其-bytes方法自行访问NSData的内容以执行您自己的搜索。

#3


1  

I had the same problem. I solved it doing the other way round, compared to the suggestions.

我有同样的问题。与建议相比,我反过来解决了这个问题。

first, I reformat the data (assume your NSData is stored in var rawFile) with:

首先,我重新格式化数据(假设您的NSData存储在var rawFile中):

NSString *ascii = [[NSString alloc] initWithData:rawFile encoding:NSAsciiStringEncoding];

Now, you can easily do string searches like 'abcd' or whatever you want using the NSScanner class and passing the ascii string to the scanner. Maybe this is not really efficient, but it works until the -rangeOfData method will be available for iPhone also.

现在,您可以使用NSScanner类轻松地执行字符串搜索,例如'abcd'或任何您想要的内容,并将ascii字符串传递给扫描程序。也许这不是很有效,但它可以工作,直到-rangeOfData方法也可用于iPhone。

#1


14  

Convert your substring to an NSData object, and search for those bytes in the larger NSData using rangeOfData:options:range:. Make sure that the string encodings match!

将您的子字符串转换为NSData对象,并使用rangeOfData:options:range:在较大的NSData中搜索这些字节。确保字符串编码匹配!

On iPhone, where that isn't available, you may have to do this yourself. The C function strstr() will give you a pointer to the first occurrence of a pattern within the buffer (as long as neither contain nulls!), but not the index. Here's a function that should do the job (but no promises, since I haven't tried actually running it...):

在iPhone上,如果没有,你可能必须自己做。 C函数strstr()将为您提供指向缓冲区中第一次出现模式的指针(只要它们都不包含空值!),而不是索引。这是一个应该完成工作的功能(但没有承诺,因为我没有尝试过实际运行它......):

- (NSUInteger)indexOfData:(NSData*)needle inData:(NSData*)haystack
{
    const void* needleBytes = [needle bytes];
    const void* haystackBytes = [haystack bytes];

    // walk the length of the buffer, looking for a byte that matches the start
    // of the pattern; we can skip (|needle|-1) bytes at the end, since we can't
    // have a match that's shorter than needle itself
    for (NSUInteger i=0; i < [haystack length]-[needle length]+1; i++)
    {
        // walk needle's bytes while they still match the bytes of haystack
        // starting at i; if we walk off the end of needle, we found a match
        NSUInteger j=0;
        while (j < [needle length] && needleBytes[j] == haystackBytes[i+j])
        {
            j++;
        }
        if (j == [needle length])
        {
            return i;
        }
    }
    return NSNotFound;
}

This runs in something like O(nm), where n is the buffer length, and m is the size of the substring. It's written to work with NSData for two reasons: 1) that's what you seem to have in hand, and 2) those objects already encapsulate both the actual bytes, and the length of the buffer.

这类似于O(nm),其中n是缓冲区长度,m是子串的大小。它被编写为与NSData一起工作有两个原因:1)这就是你似乎掌握的东西,2)这些对象已经封装了实际的字节和缓冲区的长度。

#2


1  

If you're using Snow Leopard, a convenient way is the new -rangeOfData:options:range: method in NSData that returns the range of the first occurrence of a piece of data. Otherwise, you can access the NSData's contents yourself using its -bytes method to perform your own search.

如果您正在使用Snow Leopard,一种方便的方法是NSData中的new -rangeOfData:options:range:方法,它返回第一次出现的数据的范围。否则,您可以使用其-bytes方法自行访问NSData的内容以执行您自己的搜索。

#3


1  

I had the same problem. I solved it doing the other way round, compared to the suggestions.

我有同样的问题。与建议相比,我反过来解决了这个问题。

first, I reformat the data (assume your NSData is stored in var rawFile) with:

首先,我重新格式化数据(假设您的NSData存储在var rawFile中):

NSString *ascii = [[NSString alloc] initWithData:rawFile encoding:NSAsciiStringEncoding];

Now, you can easily do string searches like 'abcd' or whatever you want using the NSScanner class and passing the ascii string to the scanner. Maybe this is not really efficient, but it works until the -rangeOfData method will be available for iPhone also.

现在,您可以使用NSScanner类轻松地执行字符串搜索,例如'abcd'或任何您想要的内容,并将ascii字符串传递给扫描程序。也许这不是很有效,但它可以工作,直到-rangeOfData方法也可用于iPhone。