I was wondering if you are interested to work on a small pilot project that I have.
I need a text recognition system that functions as below.
1. Upload documents (usually [login to view URL] format) to the system. For example, journal articles. Do not need to extract anything from databases or web scraping. If you cannot work on .pdf as it is, it is ok to extract text from the uploaded document and make it usable.
2. The system recognises the critical areas of the document and highlights them. This is the only intended outcome. In saying so, I fully understand critical areas is a subjective term. We may select an area first and define what's being considered as critical for that area.
3. We may start with a selected discipline/ area so that it would be easier to train your models.
4. I can provide you test cases to test your models. E.g. manually highlighted document to test your model against.
What I need is a system that can highlight the critical areas of a document