Google Docs Converts Images/PDFs Into 29 New LanguagesBy: Chris Crum - March 1, 2011
Google has expanded its Optical Character Recognition (OCR) feature of Google Docs into 29 new languages. It’s now available in a total of 34 languages.
This is the technology that analyzes images and PDF files and extracts text and formatting, so you can edit. The feature was introduced last summer. The development of the technology would be aided by scans of ancient texts.
In August, Google began allowing for file conversion utilizing the technology in Google Docs. It works for PDFs, JPEGs, GIFs, and PNGs.
"Hit upload, and we’ll use this information to search for the right characters in your file," says Google software engineer Jason Schaeffer. "As usual, you will get best results with sharp, high-resolution images or PDF files. This update will also result in an improvement in OCR quality for languages that we’ve supported previously (English, French, Italian, German, Spanish). We’ve also made improvements to the way we import formatting from your documents, and are now doing a better job in preserving font and alignment information."
"We’ll keep adding languages and at at the same time will continue to improve speed and accuracy for the existing ones," says Schaeffer. "In the meantime, we hope you take advantage of this new way to import your data into Google Docs."
More information about the feature, including language information can be found here.