我应该使用哪种算法进行信号（声音）一类分类？

Update this question was previously titled as "Give me the name of a simple algorithm for signal(sound) pattern detection"

更新此问题以前标题为“给我一个简单的信号(声音)模式检测算法的名称”

My objective is to detect the presence of a given pattern in a noisy signal. I want to detect the presence of a species of insect recording the sounds with a microphone. I have previously recorded the sound of the insect in a digital format.

我的目标是在嘈杂的信号中检测给定模式的存在。我想检测一种用麦克风录制声音的昆虫种类。我以前用数字格式记录了昆虫的声音。

I am not trying to do voice recognition.

我不是想做语音识别。

I am already using convolution between the input signal and the pattern to determine their similarity level. But I think that this technique is more suited to discrete time (i.e. digital communications, where signals occurs at fixed intervals) and to distinguish an input signal between 2 given patterns (I have only one pattern).

我已经在输入信号和模式之间使用卷积来确定它们的相似性水平。但我认为这种技术更适合于离散时间(即数字通信,其中信号以固定间隔发生)并且在两个给定模式之间区分输入信号(我只有一种模式)。

I am afraid to use neural networks, because I never used them, and I don't know if I could embed that code.

我害怕使用神经网络,因为我从未使用它们,我不知道我是否可以嵌入该代码。

Could you please point me some other approaches, or try to convince me that my current approach still is a good idea or that neural networks may be a feasible way?

你能否指点一些其他的方法,或试着说服我,我目前的方法仍然是一个好主意或神经网络可能是一种可行的方式?

Update I already have 2 good answers, but another one would be welcome, and even rewarded.

更新我已经有2个好的答案,但欢迎另一个,甚至奖励。

11 个解决方案

#1

A step up from convolution is dynamic time warping which can be thought of as a convolution operator that stretches and shrinks one signal to optimally match another.

卷积的一个步骤是动态时间扭曲,可以将其视为一个卷积运算符,它可以拉伸和收缩一个信号,以便与另一个信号进行最佳匹配。

Perhaps a simpler approach would be to do an FFT of the sample and determine if your insect any particular frequencies that can be filtered on.

也许更简单的方法是对样本进行FFT并确定您的昆虫是否可以过滤任何特定频率。

On the more complex side, but not quite a neural network, are SVM toolkits like libsvm and svmlight that you can throw your data at.

在更复杂的一面,但不是一个神经网络,是像libsvm和svmlight这样的SVM工具包,你可以将数据丢弃。

Regardless of the path you attempt, I would spend time exploring the nature of the sound your insect makes using tools like FFT. After all, it will be easier teaching a computer to classify the sound if you can do it yourself.

无论你尝试的路径如何,我都会花时间探索昆虫使用FFT等工具制作声音的本质。毕竟,如果你能自己动手,那么教一台电脑就能更容易对声音进行分类。

#2

Sound like a typical one class classification problem i.e. you want to search one thing in a large pool of other things you don't care about.

听起来像一个典型的一类分类问题,即你想在一大堆你不关心的其他东西中搜索一件事。

What you want to do is find a set of features or descriptors that you can calculate for every short piece of your raw recording that you can then match against the features your clean recording produces. I don't think convolution is neccessarily bad, though it is rather sensitive to noise so it might not be optimal for your case. What might actually work in your case is pattern matching on a binned fourier transform. You take the fourier transform of your signal, giving you a power vs frequency graph (rather than a power vs time graph) then you divide the frequency in bands and you take the average power for each band as a feature. If your data contains mostly white noise the patern you get from a raw insect sound of similar length will very closely match the pattern of your reference sound. This last trick has been used succesfully (with some windowing) to crack audio captcha's as used by google et al to make their sites accessible to the blind.

您要做的是找到一组功能或描述符,您可以为原始记录的每一小段计算这些功能或描述符,然后您可以根据干净记录产生的功能进行匹配。我不认为卷积是不好的,虽然它对噪音很敏感,所以它可能不适合你的情况。在您的情况下实际可能工作的是分组傅里叶变换的模式匹配。您对信号进行傅里叶变换,为您提供功率与频率图(而不是功率与时间图),然后将频率划分为频段,并将每个频段的平均功率作为特征。如果您的数据主要包含白噪声,则您从类似长度的原始昆虫声音中获得的模式将非常接近您的参考声音模式。这个最后的技巧已被成功地使用(通过一些窗口)来破解谷歌等人使用的音频验证码,使盲人可以访问他们的网站。

By the way, because your raw audio signal is digital (otherwise processing with a computer will not work ;-)) convolution is appropriate. You should perform the convolution between your reference signal and a sample of equal length from the raw input starting from each sample. So, if your reference signal has length N, and your raw sample has length M where M>=N then you should perform M-N+1=P convolutions between your reference signal and P samples from your raw input starting at 1..P. The best possibility for the location of the reference sound in the raw sample is the sample with the highest convolution score. Note that this becomes insanely time consuming very quickly.

顺便说一句,因为您的原始音频信号是数字的(否则用计算机处理将无法工作;-))卷积是合适的。您应该在参考信号和从每个样本开始的原始输入中等长的样本之间执行卷积。因此,如果您的参考信号的长度为N,并且您的原始样本的长度为M,其中M> = N,那么您应该在参考信号与原始输入的P样本之间执行M-N + 1 = P卷积,从1开始。 P.在原始样本中定位参考声音的最佳可能性是具有最高卷积分数的样本。请注意,这非常耗时非常快。

Fourier transform based matching as I explained above using 50% overlapping samples from your raw data of twice the length of your reference sample would at least be faster (though not neccessarily better)

基于傅里叶变换的匹配,如上所述,使用来自原始数据的50%重叠样本,两倍于参考样本的长度,至少会更快(尽管不是更好)

#3

Some more information is needed.

需要更多信息。

When you say noisy signal what is the background noise? Is it, to a first approximation, stationary (in a statistical sense, i.e. constant) or is it non-stationary (i.e. likely to contain other sounds, such as other animal calls etc?)

当你说嘈杂的信号是什么背景噪音?对于第一个近似,它是静止的(统计意义上,即常数)还是非静止的(即可能包含其他声音,如其他动物呼叫等?)

If the background noise is non-stationary then your best bet might be to use something called Independent Components Analysis which attempts to separate a given sound mixture into its component sources, you wouldn't even need the original recording of the insect itself. Lots of ICA software is linked from the Wikipedia page.

如果背景噪声是非静止的,那么你最好的选择可能是使用一种叫做独立成分分析的东西,它试图将给定的声音混合物分离成它的成分来源,你甚至不需要昆虫本身的原始记录。许多ICA软件都是从*页面链接的。

(Edit: ICA is a case of Blind Source Separation (BSS), there are many other ways of doing BSS and it might help to search for those as well.)

(编辑:ICA是盲源分离(BSS)的案例,还有许多其他方法可以做BSS,也可能有助于搜索这些。)

If however, the background noise is stationary then the problem is much easier (though still very hard):

然而,如果背景噪声是静止的那么问题就容易得多(尽管仍然非常困难):

In this case the approach I would use is as follows. Analyse the amplitude spectrum of a bit of the noise and the amplitude spectrum of your insect call. If you're lucky the insect call may, in general, be in a different frequency band to the noise. If so filter the incoming signal with suitable high-, low-, or band-pass filter.

在这种情况下,我将使用的方法如下。分析一下噪声的振幅谱和昆虫呼叫的振幅谱。如果幸运的话,昆虫呼叫通常可能与噪声处于不同的频带。如果是这样,使用合适的高,低或带通滤波器对输入信号进行滤波。

You can then try comparing sections of your filtered signal that contain "more energy" than average with your (filtered) insect call. Possibly by using the image similarity algorithms suggested by A. Rex.

然后,您可以尝试比较过滤后信号的部分,其中包含的“更多能量”与您的(已过滤)昆虫呼叫的平均值相比。可能通过使用A.Rex建议的图像相似度算法。

Edit: Since your background-noise is non-stationary then I can only suggest that searching for Blind Source Separation of non-Gaussian sources may lead you to some more algorithms. I'm afraid that the answer is that there is no simple algorithm that will do what you want.

编辑:由于您的背景噪声是非平稳的,因此我只能建议搜索非高斯源的盲源分离可能会引导您进行更多算法。我担心答案是没有简单的算法可以做你想要的。

#4

If I were you would start reading a little bit about Window Functions like Hamming window, this is a good starting point for sound recognition. (This is, of course, combined with Fourier Transformation)

如果我是你会开始阅读像汉明窗口这样的窗口函数,这是声音识别的一个很好的起点。 (当然,这与傅立叶变换相结合)

#5

You can try a Matched Filter. Although I've never actually used one, I've heard good things.

您可以尝试匹配过滤器。虽然我从未真正使用过,但我听说过好事。

Also, although not simple, I think a Hidden Markov Model (HMM, I know you said no speech recognition, but hear me out!) would provide the best results for you. Again, I've never actually used one but there are open source implementations available all over the place. You would just need to train it using your exisiting "clean" insect recording. Here is one open source implementation: General Hidden Markov Model Library.

此外,虽然不简单,我认为一个隐马尔可夫模型(HMM,我知道你说没有语音识别,但听我说!)会为你提供最好的结果。同样,我从来没有真正使用过一个,但是到处都有开源实现。您只需要使用现有的“清洁”昆虫记录进行训练。这是一个开源实现:通用隐马尔可夫模型库。

#6

Admittedly this is not my area of expertise but my first thought is a recursive least squares filter - it performs autocorrelation. It's similar to the convolution filter you're using now but a bit more advanced. Kalman filtering is an extension of this - it's used to regenerate a signal from multiple noisy measurements so it's probably not useful in this case. I would not reject offhand neural networks - they're very useful at this sort of thing (provided you train them properly).

不可否认,这不是我的专业领域,但我首先想到的是递归最小二乘滤波器 - 它执行自相关。它类似于你现在使用的卷积滤波器,但更高级。卡尔曼滤波是其中的一个扩展 - 它用于从多个噪声测量中重新生成信号,因此在这种情况下它可能没用。我不会拒绝那些随意的神经网络 - 它们在这种事情上非常有用(前提是你正确地训练它们)。

Thinking about this more in depth I would probably recommend using an FFT. Chances are the signal you're looking for is very band-limited, and you'd probably have more luck using a bandpass filter on the data then an FFT and finally using your simple convolution filter on that data instead of the time-domain data points. Or do both and have twice the data. I'm not heavy into math so I cant' tell you if you'll get significant (not linearly-dependent) results using this method but the only thing you're losing is time.

我更深入地考虑这个问题,我建议使用FFT。有可能你正在寻找的信号是非常有限的,你可能有更多的运气使用带通滤波器对数据然后进行FFT并最终使用简单的卷积滤波器对该数据而不是时域数据点。或两者都做,并有两倍的数据。我不会沉重于数学,所以我不能告诉你,如果你使用这种方法得到重要的(不是线性依赖的)结果,但你唯一丢失的就是时间。

#7

You may be interested in a the MA Toolbox, a Matlab implementation of similarity measure(s).

您可能对MA Toolbox感兴趣,这是一个Matlab实现的相似性度量。

I personally found this paper, General sound classification and similarity in MPEG-7, interesting. However, it might be behind a paywall (I don't know) and it might not be that useful in practice.

我个人发现这篇论文,一般声音分类和MPEG-7的相似性,很有意思。但是,它可能是付费墙(我不知道)的背后,它在实践中可能没那么有用。

The GPL-ed framework Marsyas has a tool for machine learning classification, called kea. My guess is that this probably does not do what you want or is too much effort to hook up to.

GPL-ed框架Marsyas有一个机器学习分类工具,叫做kea。我的猜测是,这可能不是你想要的,也不是太多的努力。

My only idea otherwise is to take Fourier transforms, effectively transforming your sounds into grayscale images. Then use one of the many image similarity algorithms.

我唯一的想法就是采用傅里叶变换,有效地将你的声音转换成灰度图像。然后使用众多图像相似度算法中的一种。

#8

A Naive Bayes Classifier may be worthwhile here, classifying sound samples into ones which contain your species of interest and ones which do not. It works quite well for complex phenomena; I once used it to decide if a given millimeter-wave RADAR data set contained an obstacle such as brush, a tank trap, etc. As for how to break up your continuous data into discrete chunks for the Bayesian classifier, you might just slide along the continuous data set and break off chunks equal in length to your insect sample. For example, if the sample you're comparing against is 2 seconds long, you might feed the discriminator 0-2s, 0.5-2.5s, 1-3s, etc. You'll need to train the discriminator, but that is a common requirement of any machine learning-based solution.

朴素贝叶斯分类器在这里可能是值得的,将声音样本分类为包含您感兴趣的物种的样本和不包含您的物种的样本。它适用于复杂现象;我曾经用它来决定给定的毫米波雷达数据集是否包含障碍物,如画笔,坦克陷阱等。至于如何将连续数据分解为贝叶斯分类器的离散块,你可能只是滑动连续数据集和断开块的长度与昆虫样本相等。例如,如果您要比较的样本长度为2秒,则可以为鉴别器提供0-2s,0.5-2.5s,1-3s等。您需要训练鉴别器,但这是常见的任何基于机器学习的解决方案的要求。

These sorts of approaches are about the only way to go if your insect species doesn't have a single, relatively distinct sound that you're looking for. Cross-correlation/convolution are of limited utility if you're looking for something more complex than a single sound which may be at higher or lower volume.

如果您的昆虫物种没有您正在寻找的单一,相对清晰的声音,这些方法是唯一的方法。如果您正在寻找比单个声音更复杂或更低音量的东西,则互相关/卷积的效用有限。

There are naive Bayes classifier implementations for several languages, such as nbc.

有几种语言的朴素贝叶斯分类器实现,例如nbc。

#9

You may want a Wiener filter approach.

您可能需要维纳滤波器方法。

#10

Google: FastICA algorithm. Some use ICA and Blind-Source Signal Separation Interchangeably. The author of the algorithm wrote a fantastic book on ICA that is around $40-$60 used on amazon.

谷歌:FastICA算法。一些使用ICA和盲源信号分离可互换。该算法的作者写了一本关于ICA的精彩书籍,在亚马逊上使用了约40-60美元。

#11

Goertzel - You can use it either for simple pattern detection, and for complicated frequencies separation. You can see the sample of my implementation in C#

Goertzel - 您可以将其用于简单的模式检测和复杂的频率分离。你可以在C#中看到我的实现示例

#1