analitics

Pages

Sunday, October 20, 2019

Python 3.7.4 : Usinge pytesseract for text recognition.

About this python module named tesseract, you can read here.
I tested with the tesseract tool install on my Fedora 30 distro and python module pytesseract version 0.3.0.
[root@desk mythcat]# dnf install tesseract
Last metadata expiration check: 0:24:18 ago on Sun 20 Oct 2019 10:56:23 AM EEST.
Package tesseract-4.1.0-1.fc30.x86_64 is already installed.
Dependencies resolved.
Nothing to do.
Complete!
[root@desk mythcat]# whereis tesseract
tesseract: /usr/bin/tesseract /usr/share/tesseract
[mythcat@desk ~]$ pip3 install pytesseract --user
Collecting pytesseract
...
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.0
I test with many images and texts and works very well.
Text images with a printed font are very well recognized.
This test with this image does not have very good accuracy.

The result of the handwriting image.
[mythcat@desk ~]$ python3 ocr_image.py 001.png 
rake Yous mnislakes,
take you chances,
look silby,

bul hep. mv going
dont freeze up