Toloka Team
How to recognize handwritten text from image
People may find it easier to access and utilize texts for purposes beyond reading, such as quick information searches for instance, if they are digitized. An enormous quantity of texts has accumulated since the time of the invention of paper. Hence towards the end of the twentieth century, after it became apparent to people that the task of recognition for digitizing could only be achieved by employing automatic methods, optical character recognition (OCR) technology began to be actively developed.
OCR examines scanned images of printed text and transforms those images into digital texts. Though the most sophisticated OCR models can identify almost every font type, they only work with printed text and dismiss handwritten data.
To recognize handwritten text from images, use OCR software. Try recognizing handwritten text with a mobile application which has OCR features. Another solution is to scan handwritten text and use desktop or online OCR-powered applications. And finally, some scanners also have some kind of OCR software which you can use, be it a built-in software, or a downloadable tool provided by a hardware manufacturer.
Handwritten text recognition (HTR) describes a computer-assisted automated approach to the deciphering of written records. This type of handwriting recognition would provide a great opportunity to automate the workflow of many businesses, thus simplifying the work of a human being. Both technologies are very similar, but OCR is already in an advanced state, whereas HTR is still in an early phase.
The easiest and still widely employed text recognition process involves matrix matching: with each letter in the initial image decomposed into pixel matrixes and then correlated with the matrixes held in the computer. Once they match, the individual character is considered to be recognized. This method is referred to as pattern matching and is mostly utilized in the recognition of printed texts. To make it clearer, OCR recognizes all characters one by one by applying this method.
For handwritten text and other rare or nonstandard fonts, the conventional comparison of pixel matrices may not be applicable at all. A slightly modified approach is employed in this case, namely the recognition of separate features, such as lines, curves, and other sections of letters. Such a method is also called feature extraction or feature detection and is utilized for the identification of both typed and written texts.
Modern text recognition technologies
Optical character recognition
OCR is the process of retrieving text from a picture. An image of a page represents a digital copy of text and other possible content. They can be obtained by scanning or photographing paper documents, books, letters, and so on.
Such images do not contain text available for editing yet. Instead, they are a set of pixels that collectively form a pattern of text. With recognition, the picture is processed into a text that can be edited on a PC, without having to retype it by hand. The images are converted into text using optical recognition technology.
Technologies like Intelligent character recognition (ICR) and Intelligent word recognition (IWR) are advanced subtypes of the standard optical character recognition (OCR) systems. They target handwritten rather than printed text and are incorporated into most modern recognition systems.
Intelligent character recognition
ICR is an improved OCR or more precisely a type of handwritten text detection. It deals with the recognition of separate handwritten typed characters. ICR recognition software operates with individual characters by splitting symbols into elements such as lines, curves, or loops, to identify exactly what kind of character it is.
Although this method comes with its limitations, ICR tools recognize highly structured, that is, evenly arranged characters. Examples include forms such as a questionnaire in which a person writes information in the fields reserved for individual letters. This kind of questionnaire is found, for example, in tests, when the correct answer or letter must be written in the dedicated boxes.
Modern ICR software often features a self-learning capability: a neural network that updates the recognition database automatically based on new handwriting styles. It expands the document processing capabilities of OCR and HTR. Nevertheless, ICR does not perform cursive handwriting recognition as it may only detect each individually written character so far.
Intelligent word recognition
ICR has also got a kind of evolution of its own, which is called intelligent word recognition. It is utilized for character recognition with unstructured, freehand, or cursive handwriting. It attempts to distinguish the entire word rather than individual characters.
This process is most applicable to the recognition of free-form handwritten notes since it is not individual characters that are identified, but whole coherent phrases or words. IWR wasn't designed to be a substitute for ICR and OCR, on the contrary, today's applications combine all three approaches.
IWR is intended to recognize real texts written by humans in cursive, which is often hard to recognize. Those handwritten notes cannot be recognized by ICR due to the nature of the method. IWR greatly minimizes errors that arise from typical recognition systems as it matches handwritten or printed words to a user-defined dictionary.
Methods for recognizing human written texts
Every handwritten letter, despite being written differently by each person, still comprises the same parts. However, there are many more options as to how handwritten characters, as opposed to printed ones, may look. Thus, each separate symbol represents a characteristic feature of a letter, and the main task is to find it in the initial text to recognize it.
Such tasks are handled by neural networks. Neural networks are a type of machine learning process, consisting of many simple mathematical calculations of the same nature. They are actively employed today to convert scanned handwriting into printed text.
Neural networks rely on machine learning, but first, they have to learn how to efficiently recognize text. They learn to find patterns using labeled data. The algorithms continuously process the input data, categorizing them over and over again until clear patterns are found.
ML models that can handle such a challenge demand a considerable amount of learning data. In many cases, such input data has already been processed and is available today. As an example, the MNIST dataset, in particular, includes about 70 000 pictures of hand-written digits, with the recognition accuracy of the algorithms based on it being very high reaching over 99% for convolutional neural networks.
Neural networks have the power to analyze vast amounts of information that a human being would not be able to process. They filter massive swarms of data at high speed, capturing patterns that would otherwise evade one's focus. There are numerous techniques in the neural network approach. The most popular are Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Hopfield's networks, and many others.
It is not possible to directly program the behavior of neural networks, they only undergo a process of training, which can be referred to as their primary advantage. This is due to the fact that they are able to make predictions with a level of certainty without being told by a human programmer what to do in each specific situation.
Nowadays, a significant amount of libraries for text recognition have been created. The application of these libraries greatly simplifies the development of handwriting recognition models. To enhance recognition accuracy, a dataset may be assembled for specific purposes, such as the characteristics of images or a specific language.
Handwriting recognition
Modern models of printed text recognition deliver rather high-quality results, demonstrating the relatively error-free conversion of an input image to text. However, these results are due to a limited set of fonts, which aim to be as humanly comprehensible as possible.
All typographic fonts have somewhat the same outline. More often than not, these are clearly legible and have only slight stylistic differences, for example, some people do not see the differences in Arial and Calibri fonts, even though stylistically they are not the same. However, technically, it is easier to teach the computer to recognize fonts of this type, because the shapes and symbols that make up the letters of these fonts are mostly similar.
Handwritten text recognition is a more complicated matter. Everyone has their own handwriting, which may even change as time passes. The variability of handwriting patterns is quite substantial. One person may form their habits of writing this or that character in a certain way throughout their life, with only one person being able to write it that way.
Aside from the fact that training a handwritten text recognition model involves creating a dataset, as mentioned earlier, which is already not an easy task on its own, there is also the difficulty of labeling such gathered information.
For instance, sometimes recognizing a historical document requires a specialist who is knowledgeable in the ways people used to write. If the handwritten text is very intricate, it may require two or more people to interpret it and label each letter correctly. However, even for simple datasets, several annotations by multiple people must exist so that errors that annotators often make when trying to label handwritten text can be corrected.
How to convert scanned handwriting?
With the appropriate software, you may easily convert handwritten text to printed one. Such recognition involves the following steps for converting scanned pictures or photos into text:
Image processing
In order to convert scanned handwriting into printed text, the input image with text that is put into the system must be stripped of noise and converted to a form that enables efficient character extraction and detection. Generally, the image is enhanced, contrasted, straightened, and converted into the format used by the system.
Thresholding binarization plays an essential role, which is the transformation of an image into black-and-white from a color or grayscale format. Such a conversion allows for a distinct separation of the text from the background, simplifies the further application of many algorithms, and also removes some noises from the image.
Highlighting the area of interest
This step highlights the area of the image that contains the text to be recognized. In other words, a specialist has to detect handwriting in an image, while discarding elements that are not text. These include such objects as smudges and stains on the paper that were not removed during the binarization process.
Segmentation into lines and characters
A text image has to be separated into lines, then the lines are divided into words and then into characters before the optical character recognition system can process each character individually.
Since handwritten text, unlike typewritten text, is generally written following a certain curve, difficulties may arise in dividing the input.
images of handwritten text into lines, which does not allow the algorithms suitable for typewritten texts to be applied directly. Lines may bend, or be too close together, and text elements belonging to different lines may overlap.
Methods of baseline extraction attempt to trace some imaginary line along which a person writes, and then reconstruct a line from it. Following this step, different recognition systems employ their own unique algorithms.
Symbol processing
The symbol image may be processed in its entirety by comparing it to available templates. Alternatively, the characteristics of the depicted symbol are extracted: the relevant features are selected and classified according to the criteria present in the application.
Recognition result
The possible versions of the letter appear as output. Generally, however, the recognition system continues working through other methods, refining the achieved result. The recognition engine may not always follow all the mentioned steps, however, the basic actions of the recognition process are shared by all algorithms.
Making a handwriting recognition model
For all of the steps described above to be possible with handwritten text, a trained handwriting recognition model has to be created. The following are the basic steps for creating such a model.
Data gathering
The first thing specialists have to do is to collect a training data set containing images with words with different handwriting in the language they plan to work with. It may include photos, scans of handwritten notes, scanned documents, letters, and so on.
Model developers can use ready-made datasets, of which there is now a large number and many are freely available. Alternatively, they can build their own datasets. For example, they can distribute dedicated word-writing forms to a large group of people, such as students or colleagues, in order to cover as many handwritings as possible.
A faster solution would be to gather training data through crowdsourcing. This relatively new approach is an effective tool for gathering vast amounts of data. On crowdsourcing platforms, the customer gives the assignment to a large group of people, most often freelancers, who will submit images of handwritten text for a small fee.
Annotation
Graphic images of a document, including those written by hand, do not represent a text document yet. The human brain is designed in such a way that it is enough just to look at a sheet of paper with text to understand what is written on it (depending on the handwriting of course, some are incomprehensible even to humans). From the computer's point of view, a scanned document is just a set of colored dots and does not look like a text document at all. The model cannot extract relevant features on its own.
Therefore, the data collected has to be annotated, because the model cannot learn to identify letters in the image on its own. Instead, it needs to be shown how the handwritten symbol corresponds to the printed letter so that in the future it can extract text from handwritten notes and help people recognize similar characters. As already mentioned, it takes more than one person to get a better result and avoid annotation errors.
This is where crowdsourcing comes in handy again. The final decision on assigning a particular letter to an image is reached through the agreement of volunteers scattered who are located all over the world. To filter out unscrupulous contributors, it is essential to create high-quality control tasks according to which a person's expertise may be evaluated.
Model training
Once all photos, scans, and documents containing human written text have labels, specialists can begin training the model. As a final result of the training, the recognition model has to be able to provide a reliable output: a text file in a digital format.
Moreover, the text must be of high quality, that is, just a set of incoherent letters will not do. If this is the case, ML specialists may conclude that either the dataset was of poor quality, it was improperly labeled, or the training process was faulty. Ideally, all cases of duplicate characters, repeated characters, and unrecognized ones should be solved.
One way or another, once recognition problems are detected, specialists will have to start over by carefully examining all steps of model preparation to calculate exactly at what stage the failure in model preparation occurred.
Quality assurance and model monitoring
After achieving high-quality recognition of written text by the model, developers must not forget continuous quality control. This step is necessary to guarantee that the model's performance is always of excellent quality and that it delivers the best possible result over a long period of time.
Furthermore, this stage of work can indicate to the development team that the model has to be further trained with additional datasets containing new images of various handwriting.
Applications of handwritten text detection
OCR and HTR systems are employed in countless fields. Some of the tasks that text recognition systems solve are as follows:
Generation of digital versions of printed and handwritten documents.
Data reading on forms and questionnaires.
Automated vehicle license plate recognition.
Technology to assist blind and visually impaired individuals.
ID documents data recognition.
Extraction of information from business cards into contact lists.
Handwritten text detection may also be used for quick editing of one's notes and memos. When you write notes in class, you may take a picture of them and generate text on your computer that can be edited and modified. Handwriting recognition simplifies and speeds up the paperwork in hospitals, and government institutions that provide services for citizens. For writers who handwrite their books on paper and then retype the finished text, this automated process can make the job a lot easier.
Such technology can simplify the job of historians who decipher historical documents that are written by hand. Some projects involve deciphering old books and ancient manuscripts. People decode photos or scans of such books by hand, which is often a complicated process. Very few people know how to do this. If a computer could do it, it could speed up the process dramatically. Machine learning could make this job a lot easier.
These are certainly not all areas of application for handwritten text detection. So, the development of handwriting recognition technologies that can greatly simplify the process of data entry is an essential task for many users.
Summing up
Currently, quite a few types of OCR systems exist. However, only some of them can recognize handwriting. Recognition systems with high speed and accuracy are typically very expensive to create, which makes them hardly available for mass implementation of online OCR other than the existing ones from major players.
Text recognition can significantly improve and simplify the work of many people in various fields and institutions. Some development teams have already achieved an impressive level of handwriting recognition. However, many software solutions that are commonly available on the market do not fully solve the given task. Therefore, the challenge of developing a system that is accessible to a wide range of people and that enables the recognition of handwritten characters remains urgent. Through the advancements in the field of machine learning and neural networks, this task does not seem to be unfeasible.
About Toloka
Toloka is a European company based in Amsterdam, the Netherlands that provides data for Generative AI development. Toloka empowers businesses to build high quality, safe, and responsible AI. We are the trusted data partner for all stages of AI development from training to evaluation. Toloka has over a decade of experience supporting clients with its unique methodology and optimal combination of machine learning technology and human expertise, offering the highest quality and scalability in the market.
Article written by:
Toloka Team
Updated:
Feb 15, 2023