用于屏幕文本的OCR（光学字符识别）

I'm trying to create a piece of software that automate the PC by capturing the screenshot, then OCR (Optical Character Recognition) it looking for a particular button to click (for example). I've got the mouse and keyboard control part, but now, I needed an OCR to process the screenshot. What I discovered is that Tesseract OCR does not seems to work very well with on-screen text. The text is either too small, or that some of text seems to be connected, like for example K and X. How should I go about this?

我正在尝试创建一个软件,通过捕获屏幕截图自动化PC,然后OCR(光学字符识别)它寻找特定按钮点击(例如)。我有鼠标和键盘控制部分,但现在,我需要一个OCR来处理屏幕截图。我发现Tesseract OCR似乎不能很好地与屏幕文本一起使用。文本太小,或者某些文本似乎已连接,例如K和X.我该如何处理?

p/s: this is for an automated test program.

p / s:这是一个自动测试程序。

2 个解决方案

#1

I am not sure if this really fits the bill for you, but some of the better OCR that I have seen in automation is done by Tevron's CitraTest. It has a library of fonts included and if a fontset is not present, they will create a new one based on your submissions. Nagative factors with this tool would be cost and the usual issues related to variable screen resolution.

我不确定这是否适合您,但我在自动化中看到的一些更好的OCR是由Tevron的CitraTest完成的。它包含一个包含字体的库,如果没有字体集,它们将根据您的提交创建一个新字体。使用此工具的长期因素是成本和与可变屏幕分辨率相关的常见问题。

#2

Perhaps look at this question on image enhancement prior to OCR. Otherwise this question is pretty similar to "OCR for .NET".

也许在OCR之前看一下关于图像增强的这个问题。否则这个问题非常类似于“OCR for .NET”。

If you are feeling really bold you can always whip up a simple Perceptron or Neural Network based approach :-)

如果您感觉非常大胆,您可以随时提出简单的Perceptron或基于神经网络的方法:-)

#1

#2