Alright, it’s #FutureFridays and today’s topic is OCR (Optical Character Recognition) and AI (Artificial Intelligence).
OCR has been around for quite some time already, and is widely considered as mainstream technology.
OCR is simply the ability to have a machine convert text from a scanned document or image into machine readable format.
In a sense, it is a simple form of “Artificial Intelligence,” but because it has become mainstream technology already, most people no longer consider it to be a subset of AI, even though technically, it is, as it is part of Computer Vision.
How OCR really becomes AI is dependent on what you do with the machine readable data after the conversion process.
Let me show you an example of a how a company took OCR technology to the next level with AI.
Paradatec’s Prosar AIDA Advanced OCR Technology
Video ©Paradatec Inc.
In this video you may have noticed a couple of things happening.
After the documents are scanned and OCR does its work, what happens is the software reads the contents and compares them with its training data as reference.
Training data is the information it has learned thus far. For example, if you show it thousands and thousands of title documents, if it sees a title document, it will know what a title document looks like when it sees it.
From there it is able to determine the document type, and classify it accordingly, even for unstructured documents i.e. not templated.
If it doesn’t recognize the document, or it isn’t sure what it is, it can exclude the document from the workflow and give it to a human who knows how to identify the document and index (manually classify) it accordingly.
This can also “teach” the machine learning algorithm, and the new information taught by the human can be added into its training data as needed, so that with enough samples, it can start recognizing and classifying the way it was taught.
Moreover, the data can be extracted from the documents and transferred into further downstream processes accordingly.
This is pretty similar to how Amazon’s AWS SageMaker Ground Truth AI Labeling platform works, which creates training datasets for Machine Learning.
Introducing AWS SageMaker Ground Truth
Video ©Amazon Web Services
Once you learn the terminologies it will all make sense.
Classification = How AI identifies what the data is and therefore classifies it accordingly.
Whereas Labeling = Indexing = How humans identify data, which the machine learning algorithm learns from.
For example, telling the difference between a dog and a cow, or when an MRI has an anomaly or not.
Humans usually combine datasets as well to make a decision before labeling accordingly, like by using sensors, geo-location, thermal imaging, researching reference materials, etc.
The machine learning algorithm can learn from this as well so it can classify accordingly once it is fed the training data.
That’s it! I hope you found this to be of value.
If you have any questions or feedback, feel free to reply with a comment. I would love to hear from you.
One Reply to “How can OCR (Optical Character Recognition) become AI (Artificial Intelligence)? Here’s how.”