Hi,
I worked on many projects involving ocr.
tesseract is the most widely free engine that has decent success rate.
I have few questions regarding the project.
Are the statements scanned from paper statements? or are they plain normal images generated by other softwares?
Also, do you have an approximation of the number of different format, or is it totally open and could vary widely.
Are the different labels/descriptions always the same, are they predictables, such as "Account name, account number, Date, Balance" etc...
I have many other questions, but most probably I can work on the project and can garantee you the maximum efficiency and stability.
I know all the intricacies of google engine teseract, I know nearly all the small bugs that can be fixed with some hacks, these include confusing "B and 8, 0 and O" and some other very subtle details.
I can dedicate myself fulltime on the project until your full satisfaction.
P.S
In most cases the statements are in pdf format, this is the case of the statements example found
on google, this will make things easier.
The app will take into account any format, pdf, or any kind of image, bmp, tiff, png or any imaginable format.
Hope to work for you soon
Nassef Knani