OCR - 使用tesseract 3.0和imagemagick 6.6.5从图像中获取文本

时间:2022-06-01 20:54:49

I am trying to build a shell script that allows me to search for text in an image. Based on the text, the script will try its best to get the text from the image. I wanted your input on this as this script seems to work with most images, but not those images where the text font color is similar to smaller-surroundings around the text.


# !/bin/bash
# imt-ocr.sh is image magick tessearc OCR tool that is used for finding out text in image
# Arguments:
# 1     -- image filename (with path)
# 2     -- text to search in image      (default to '')
# 3     -- occurence of text            (default to 1)
# Usage:
# imt-ocr.sh [image_filename] [text_to_search] [occurence]

occurence=$3    # Default to 1
if [ "$occurence" == "" ]

get_major_color ()
# Returns the major color of an image with its hex value
#       Parameter:      Image filename (with path)
#       Return format:  Returns a string "hex_val_of_color major_color_name"
convert $1 -format %c histogram:info: > x.txt
cat x.txt | awk '{print $1}' > x1.txt
h=$(sort -n x1.txt | tail -1);
color_info=$(cat x.txt | grep "$h" | cut -d '#' -f2)
rm -rf x.txt x1.txt
echo "$color_info"

# Inverts the color hex value
#       Parameter:      Hex value to be inverted
#       Return format:  Returns in hex
input_color_hex=$1                                              # Input color's hex value
white_color_hex=FFFFFF                                          # White color's  hex vlaue
inv_color_hex=`echo $(printf '%06X\n' $((0x$white_color_hex - 0x$input_color_hex)))`
echo $inv_color_hex

for ((scale=$start_scale, attempt=$attempt; scale <= $end_scale ; scale=scale+$increment_scale, attempt++))
                echo "IMT-OCR-LOG: Scaling image to $scale% in attempt #$attempt"
                convert $image -type Grayscale -scale $scale% $tmp_img
                tesseract $tmp_img OUT
                found_oc=$(grep -o "$txt" OUT.txt | wc -l)
                echo "IMT-OCR-LOG: Found $found_oc occurence(s) of text '$txt' in attempt #$attempt"
                if [ $occurence -le $found_oc ] && [ $found_oc -ne 0 ]
                        echo "IMT-OCR-LOG: Printing out the last text found on image"
                        echo "IMT-OCR-LOG: ======================================================"
                        cat OUT.txt
                        echo "IMT-OCR-LOG: ======================================================"
                        rm -rf $tmp_img OUT.txt
                        exit 1
                        echo "IMT-OCR-LOG: Getting major color of image in attempt #$attempt"
                        color_info=`get_major_color $image`
                        true_color=$(echo $color_info | awk '{print $2}')
                        true_val=$(echo $color_info | awk '{print $1}')
                        echo "IMT-OCR-LOG: Major color of image is '$true_color' with hex value of $true_val in attempt #$attempt"

                        # Blur the image
                        echo "IMT-OCR-LOG: Bluring image in attempt #$attempt"
                        convert $tmp_img -blur 1x65535 $tmp_img

                        # Flip the color
                        inverted_val=`invert_color $true_val`
                        echo "IMT-OCR-LOG: Inverting the major color of image from 0x$true_val to 0x$inverted_val in attempt #$attempt"
                        convert $tmp_img -fill \#$inverted_val -opaque \#$true_val $tmp_img

                        # Sharpen the image
                        echo "IMT-OCR-LOG: Sharpening image in attempt #$attempt"
                        convert $tmp_img -sharpen 1x65535 $tmp_img

                        # Find text
                        tesseract $tmp_img OUT
                        found_oc=$(grep -o "$txt" OUT.txt | wc -l)
                        echo "IMT-OCR-LOG: Found $found_oc occurence(s) of text '$txt' in attempt #$attempt"
                        if [ "$found_oc" != "0" ]
                                if [ $occurence -le $found_oc ]
                                        echo "IMT-OCR-LOG: Printing out the last text found on image"
                                        echo "IMT-OCR-LOG: ======================================================"
                                        cat OUT.txt
                                        echo "IMT-OCR-LOG: ======================================================"
                                        rm -rf $tmp_img OUT.txt
                                        exit 1

                rm -rf OUT.txt


rm -rf $tmp_img

Here is a sample example with problem, image (test.jpeg) http://www.igoipad.com/wp-content/uploads/2012/07/03-Word-Collage-iPad.jpeg


[admin@ba-callgen image-magick-tesseract-processing]$ sh imt-ocr.sh test.jpeg Common
IMT-OCR-LOG: Scaling image to 100% in attempt #1
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Common' in attempt #1
IMT-OCR-LOG: Getting major color of image in attempt #1
IMT-OCR-LOG: Major color of image is 'grey96' with hex value of F5F5F5 in attempt #1
IMT-OCR-LOG: Bluring image in attempt #1
IMT-OCR-LOG: Inverting the major color of image from 0xF5F5F5 to 0x0A0A0A in attempt #1
IMT-OCR-LOG: Sharpening image in attempt #1
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Common' in attempt #1
IMT-OCR-LOG: Scaling image to 200% in attempt #2
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 1 occurence(s) of text 'Common' in attempt #2
IMT-OCR-LOG: Printing out the last text found on image
IMT-OCR-LOG: ======================================================
Settings M...
Common words
Exclude numbers
word case
Theme & Layuul
Color theme
Word layout
Clrien lalion
Lrmclsc ape
Ergl sw v.-ords >
li( `
I):Jntc1'\:1r\qa )
Landon Spring >
Hough Trad >
H3'fJ|1d :-Ialf >

IMT-OCR-LOG: ======================================================
[admin@ba-callgen image-magick-tesseract-processing]$ 
[admin@ba-callgen image-magick-tesseract-processing]$ 
[admin@ba-callgen image-magick-tesseract-processing]$ 
[admin@ba-callgen image-magick-tesseract-processing]$ 
[admin@ba-callgen image-magick-tesseract-processing]$ 
[admin@ba-callgen image-magick-tesseract-processing]$ 
[admin@ba-callgen image-magick-tesseract-processing]$ sh imt-ocr.sh test.jpeg Portrait
IMT-OCR-LOG: Scaling image to 100% in attempt #1
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Portrait' in attempt #1
IMT-OCR-LOG: Getting major color of image in attempt #1
IMT-OCR-LOG: Major color of image is 'grey96' with hex value of F5F5F5 in attempt #1
IMT-OCR-LOG: Bluring image in attempt #1
IMT-OCR-LOG: Inverting the major color of image from 0xF5F5F5 to 0x0A0A0A in attempt #1
IMT-OCR-LOG: Sharpening image in attempt #1
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Portrait' in attempt #1
IMT-OCR-LOG: Scaling image to 200% in attempt #2
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Portrait' in attempt #2
IMT-OCR-LOG: Getting major color of image in attempt #2
IMT-OCR-LOG: Major color of image is 'grey96' with hex value of F5F5F5 in attempt #2
IMT-OCR-LOG: Bluring image in attempt #2
IMT-OCR-LOG: Inverting the major color of image from 0xF5F5F5 to 0x0A0A0A in attempt #2
IMT-OCR-LOG: Sharpening image in attempt #2
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Portrait' in attempt #2
IMT-OCR-LOG: Scaling image to 300% in attempt #3
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Portrait' in attempt #3
IMT-OCR-LOG: Getting major color of image in attempt #3
IMT-OCR-LOG: Major color of image is 'grey96' with hex value of F5F5F5 in attempt #3
IMT-OCR-LOG: Bluring image in attempt #3
IMT-OCR-LOG: Inverting the major color of image from 0xF5F5F5 to 0x0A0A0A in attempt #3
IMT-OCR-LOG: Sharpening image in attempt #3
Tesseract Open Source OCR Engine with Leptonica
IMT-OCR-LOG: Found 0 occurence(s) of text 'Portrait' in attempt #3
[admin@ba-callgen image-magick-tesseract-processing]$ 

As you can see I can find text "common", but not "Portrait". The reason is because of the font color of the Portrait. Any help to improve this script...


I am using Centos 5.

我正在使用Centos 5。

1 个解决方案



Do not artificially limit yourself to just evaluate one or two methods when you manipulate your input image. You seem to only use -blur and -scale right now.


You should also consider to use the following operations:


  • -contrast
  • -despeckle
  • -edge
  • -negate
  • -normalize
  • -posterize
  • -type grayscale
  • -monochrome
  • -gamma
  • -antialias / +antialias
  • -antialias / + antialias

Input Image: OCR  - 使用tesseract 3.0和imagemagick 6.6.5从图像中获取文本

See for example what this command produces:


convert 03-Word-Collage-iPad.jpeg             \
    -scale 1000%                              \
    -blur 1x65535 -blur 1x65535 -blur 1x65535 \
    -contrast                                 \
    -normalize                                \
    -despeckle -despeckle                     \
    -type grayscale                           \
    -sharpen 1                                \
    -posterize 3                              \
    -negate                                   \
    -gamma 100                                \
    -compress zip                             \

Output Image: OCR  - 使用tesseract 3.0和imagemagick 6.6.5从图像中获取文本
(Sorry, when uploading a TIFF to this website it gets auto-converted to PNG. So you don't really get my TIFF when downloading the image you see above -- but you'll nevertheless see a close-enough picture of my real result.)

输出图片:(对不起,当将TIFF上传到这个网站时,它会自动转换为PNG。所以你在下载上面看到的图像时并没有真正得到我的TIFF - 但是你会看到一张足够近的图片我的真实结果。)

Note 1: I tested this with this ImageMagick version:


convert -version
  Version: ImageMagick 6.7.6-9 2012-05-12 Q16 http://www.imagemagick.org
  Copyright: Copyright (C) 1999-2012 ImageMagick Studio LLC

Note 2: Older or newer versions of ImageMagick may behave differently, especially when it comes to -posterize!


And this is the result of Tesseract's OCR for a.tif:


tesseract a.tif OUT  &&  cat OUT.txt

Tesseract Open Source OCR Engine v3.01 with Leptonica
   Page 0
   Common words Remove English words >
   Exclude numbers
   Word case Don't change 1+
   Theme & Layout
   Color theme London Spring >
   Font Rough Trad >
   Word layout Half and Half >


I verified that the most recent version of ImageMagick 6.7.9-0 (released yesterday) does not produce the same exact result as I showed with above command + screenshot (made with version 6.7.6-9). Here is the difference:

我验证了ImageMagick 6.7.9-0(昨天发布)的最新版本没有产生与我在上面的命令+屏幕截图(使用版本6.7.6-9制作)中显示的完全相同的结果。这是区别:

OCR  - 使用tesseract 3.0和imagemagick 6.6.5从图像中获取文本

In any case, I'm sure if you tweak my command a bit, playing with various parameters, you'll get it to work for you, whatever your ImageMagick version is...




Do not artificially limit yourself to just evaluate one or two methods when you manipulate your input image. You seem to only use -blur and -scale right now.


You should also consider to use the following operations:


  • -contrast
  • -despeckle
  • -edge
  • -negate
  • -normalize
  • -posterize
  • -type grayscale
  • -monochrome
  • -gamma
  • -antialias / +antialias
  • -antialias / + antialias

Input Image: OCR  - 使用tesseract 3.0和imagemagick 6.6.5从图像中获取文本

See for example what this command produces:


convert 03-Word-Collage-iPad.jpeg             \
    -scale 1000%                              \
    -blur 1x65535 -blur 1x65535 -blur 1x65535 \
    -contrast                                 \
    -normalize                                \
    -despeckle -despeckle                     \
    -type grayscale                           \
    -sharpen 1                                \
    -posterize 3                              \
    -negate                                   \
    -gamma 100                                \
    -compress zip                             \

Output Image: OCR  - 使用tesseract 3.0和imagemagick 6.6.5从图像中获取文本
(Sorry, when uploading a TIFF to this website it gets auto-converted to PNG. So you don't really get my TIFF when downloading the image you see above -- but you'll nevertheless see a close-enough picture of my real result.)

输出图片:(对不起,当将TIFF上传到这个网站时,它会自动转换为PNG。所以你在下载上面看到的图像时并没有真正得到我的TIFF - 但是你会看到一张足够近的图片我的真实结果。)

Note 1: I tested this with this ImageMagick version:


convert -version
  Version: ImageMagick 6.7.6-9 2012-05-12 Q16 http://www.imagemagick.org
  Copyright: Copyright (C) 1999-2012 ImageMagick Studio LLC

Note 2: Older or newer versions of ImageMagick may behave differently, especially when it comes to -posterize!


And this is the result of Tesseract's OCR for a.tif:


tesseract a.tif OUT  &&  cat OUT.txt

Tesseract Open Source OCR Engine v3.01 with Leptonica
   Page 0
   Common words Remove English words >
   Exclude numbers
   Word case Don't change 1+
   Theme & Layout
   Color theme London Spring >
   Font Rough Trad >
   Word layout Half and Half >


I verified that the most recent version of ImageMagick 6.7.9-0 (released yesterday) does not produce the same exact result as I showed with above command + screenshot (made with version 6.7.6-9). Here is the difference:

我验证了ImageMagick 6.7.9-0(昨天发布)的最新版本没有产生与我在上面的命令+屏幕截图(使用版本6.7.6-9制作)中显示的完全相同的结果。这是区别:

OCR  - 使用tesseract 3.0和imagemagick 6.6.5从图像中获取文本

In any case, I'm sure if you tweak my command a bit, playing with various parameters, you'll get it to work for you, whatever your ImageMagick version is...
