Google Writes Up OCRopus

    April 11, 2007

Google is “happy to announce” the new OCRopus OCR Project, which will focus on the cutting edge of handwriting recognition technology.  Expect the Image Understanding and Pattern Recognition (IUPR) research group to do the heavy lifting, though – Google’s role will mostly be that of a sponsor.  (The state of Rhineland Palatinate, according to IUPR, is also providing funding.)

Over the next three years, the project will pursue a pretty lofty goal – “a high quality OCR system suitable for document conversions, electronic libraries, vision impaired users, historical document analysis, and general desktop use.”  But the potential benefits are obviously both impressive and abundant.

In fact, they’re even bringing some people back to the Google fan base.  “This is the sort of thing that makes me like Google again,” wrote InfoWorld’s Matt Asay.  “Great move, Google.  This looks like a fantastic project on which to work.”

If you’d like to see a preview, it’s “available on the project’s website under an Apache licence.”  You might also look at “the recently open sourced Tesseract OCR system, a separate Google project for probabilistic natural language modeling, and software for layout analysis and character recognition” – parts of it will be rolled into the newest effort.

OCRopus will, by the way, be focused on recognizing the English language, despite the fact that IUPR is based in Germany.

Lastly – well, despite OCRopus’s big-name sponsors, the project may still be in need of some help.  Thomas Buel, the project’s leader, writes, “We are hoping for contributions by the open source community.”