56tvmao: How-to instructions you can trust. productivity How to Convert Handwritten Documents to Text

How to Convert Handwritten Documents to Text

If you have a pile of handwritten documents that beg to be digitized to allow for easy editing, sharing, and storage, handwriting to text (HTR) technology is here to rescue you. With its help, you can convert handwritten documents to text in a few simple steps, and all you need is a scanner and software.

Content

The Challenge of Scanning Handwritten Documents

Scanning handwritten documents and converting them to digital text can be a real pain, as it comes with a unique set of challenges:

  • Handwriting varies from person to person, making it difficult for standard Optical Character Recognition (OCR) software to recognize and transcribe the text accurately.
  • Handwritten documents often contain errors, such as crossed-out words and misspellings, which can further confuse scanning software.
  • Many documents that have been written by hand are old, and the quality of the paper, the ink used, and even the presence of stray marks or folds can further complicate the scanning process.

To cope with these and other challenges, software developers have created specialized Handwriting to Text (HTR) software, designed specifically for the job of converting handwritten documents to text.

HTR tools use advanced algorithms to adapt to different handwriting styles, differentiate between intentional text and stray marks or corrections, and deal with old or damaged documents.

Convert Handwritten Documents to Text Using Transkribus

When it comes to converting handwritten documents to text, there are many HTR tools available. However, the one I recommend the most is Transkribus. It’s an online software with a desktop version that’s extremely easy to pick up, and you can train it to improve its performance.

Out of the box, the results with Transkribus may be underwhelming. However, the real power of this tool lies in its training interface. With some time and effort, you can train Transkribus to recognize your handwriting more accurately, which can significantly improve the quality of the transcription.

The free version of Transkribus lets you convert up to 100 documents and perform up to five training runs a month (more about them soon). To get started, visit the tool’s website, click the Try for free button, and create a user account.

To begin converting your document, open the default collection in Transkribus. Think of collections as folders where you can organize your work, with each collection containing individual documents. Each document is composed of images that represent the actual pages of your text.

To add your document, click the Upload Files button. Transkribus accepts various formats, such as JPEGs, PNGs, and PDFs, but for optimal recognition, it recommends using 300 DPI JPEGs. Once your documents are uploaded, you’re ready to convert the handwritten document to text.

Open the document, and select all images you want to convert. Click the Recognize button.

Transkribus offers a range of public models for different languages and time periods. For immediate text recognition without any training, choose one that best matches your document’s characteristics, then click the Start Recognition button and wait. I went with The English Eagle model.

Handwritten text recognition jobs created by free users receive a low priority, so it may take a while for Transkribus to finish.

After the recognition process, fine-tune the results using the Transkribus document editor. It synchronizes text and image views for an intuitive editing process. You can use tags to mark entities, events, or uncertain transcriptions.

Train a Custom Model to Improve HTR Performance

To train a custom model, prepare your ground truth data. This involves accurately transcribing a set of handwritten documents that match the writing styles you want the model to recognize. The more varied and representative your data, the better your model will perform.

To train a model, click on the Train New Model button. Select the Text Recognition Model option, choose the collection containing your ground truth document(s), then select the pages to be included in the training and validation data. The training data is used to fit the parameters of the model, while the validation data provides an unbiased evaluation of the model’s performance.

Configure the model’s settings, such as the language and character set to start the training process, which involves multiple cycles or “epochs” where the model learns from your data. Transkribus automatically stops the training when the model’s performance stops improving.

After training, use your custom model to transcribe new documents with improved accuracy.

Alternatives to Transkribus

While Transkribus is my top choice for converting handwritten documents to text, there are many other great options:

  • Pen2Txt is a newcomer in the HTR landscape. It aims to deliver high accuracy by leveraging the latest in AI technology to adapt to diverse handwriting styles. While still a work in progress, Pen2Txt offers a user-friendly interface and solid performance. However, free users are limited to only three conversions.
  • Google Document AI is part of Google’s suite of AI-powered document processing tools. It offers excellent out-of-the-box recognition without training, making it a solid choice for quick conversions. You can get $300 in free credit to try the tool, but you’ll need to pay on a per-conversion basis for continued use.
  • GrabText is a simple online tool that captures handwritten or printed text from photos, graphics, and documents, and converts it into editable text. It offers a straightforward three-step process: capture the text, apply automatic corrections (including spelling and grammar), and export the converted text in various formats. Unfortunately, you need to invite a friend to use it for free.

Whether you choose Transkribus or one of the alternatives mentioned above, you’ll be able to digitize your documents with ease. If you’re looking for more options, learn how to convert images to text using OCR on Android.

Image credit: Pixabay. All screenshots by David Morelo.


David Morelo
Staff Writer

David Morelo is a professional content writer in the technology niche, covering everything from consumer products to emerging technologies and their cross-industry application. His interest in technology started at an early age and has only grown stronger over the years.

Subscribe to our newsletter!

Our latest tutorials delivered straight to your inbox

Sign up for all newsletters.
By signing up, you agree to our Privacy Policy and European users agree to the data transfer policy. We will not share your data and you can unsubscribe at any time. Subscribe

Related Post