使用Tesseract OCR进行汉字识别

时间:2022-12-05 19:31:03

I have been using Tesseract 3.0.2 OCR SDK for image text extraction. But if I use Chinese text images and pass through OCR then Tesseract doesn't provide me the Chinese characters instead of that I am getting numeric and english characters. But I need Chinese characters as displayed in the image I am using.

我一直在使用Tesseract 3.0.2 OCR SDK进行图像文本提取。但是,如果我使用中文文本图像并通过OCR,那么Tesseract不会提供中文字符而不是我获得数字和英文字符。但是我需要在我正在使用的图像中显示中文字符。

How can I achieve this? Is there any way I can obtain Chinese characters rather than any other characters?

我怎样才能做到这一点?有什么方法可以获得汉字而不是任何其他字符吗?

1 个解决方案

#1


11  

You need to download chinese trained data (it will be a file like chi_sim.traineddata) and add it to your tessdata folder.

您需要下载中文训练数据(它将是一个像chi_sim.traineddata这样的文件)并将其添加到您的tessdata文件夹中。

To download the file https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata

下载文件https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata

and use like this

并使用这样的

Tesseract* tesseract= [[Tesseract alloc] initWithDataPath:@"tessdata" language:@"chi_sim"];

if you have any problem you can download my experiment with tessaract (with chinese language support) from https://github.com/aryansbtloe/ExperimentWithTesseract.git

如果您有任何问题,可以从https://github.com/aryansbtloe/ExperimentWithTesseract.git下载我的tessaract实验(中文支持)

I have tested this one...Hope you will find this useful.

我测试过这个......希望你会发现这个很有用。

#1


11  

You need to download chinese trained data (it will be a file like chi_sim.traineddata) and add it to your tessdata folder.

您需要下载中文训练数据(它将是一个像chi_sim.traineddata这样的文件)并将其添加到您的tessdata文件夹中。

To download the file https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata

下载文件https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata

and use like this

并使用这样的

Tesseract* tesseract= [[Tesseract alloc] initWithDataPath:@"tessdata" language:@"chi_sim"];

if you have any problem you can download my experiment with tessaract (with chinese language support) from https://github.com/aryansbtloe/ExperimentWithTesseract.git

如果您有任何问题,可以从https://github.com/aryansbtloe/ExperimentWithTesseract.git下载我的tessaract实验(中文支持)

I have tested this one...Hope you will find this useful.

我测试过这个......希望你会发现这个很有用。