Back in 2008, Google filed a patent, which was recently published for public viewing. The patent is called "Segmenting Printed Media Pages Into Articles," and appears to imply that the company wants to take individual articles from print publications and turn them into individual articles on the web. The abstract says:
Methods and systems for segmenting printed media pages into individual articles quickly and efficiently. A printed media based image that may include a variety of columns, headlines, images, and text is input into the system which comprises a block segmenter and a article segmenter system. The block segmenter identifies and produces blocks of textual content from a printed media image while the article segmenter system determines which blocks of textual content belong to one or more articles in the printed media image based on a classifier algorithm. A method for segmenting printed media pages into individual articles is also presented.
An archived newspaper page in Google News (content not separated)
A hat tip goes to Erik Sherman writing for Bnet, who says, "Although this could allow Google to convert stacks of periodicals into electronic archives, it potentially sends the company headlong into conflict with a famous Supreme Court ruling on media law."
"There’s just one legal problem: New York Times Co. , et. al. v. Jonathan Tasini et. al. Usually called the Tasini case, freelance writers sued the New York Times and other print publications for licensing individual articles to database companies without permission from the writers, who retained the copyright on the articles," he explains. "One of the main turning points was that the publishers had explicit permission only to include the articles in the print publication. However, copyright law did not allow the publishers to break their publications up and make the articles accessible to readers out of the original context."
He goes on to note that Google could go back far enough into old print archives before rights were such an issue, and would be dealing with freelance writers who mostly didn't copyright their articles. The technology could certainly be used in any future partnerships the company could make with print publishers, should the publications ever wish to go that route.
What do you make of the patent? You can read the entire patent application here, in patent application-speak.