Google News Archive: Things You Should Know
Gary Price has blogged over at Resource Shelf a huge post about Google News’ new Archive Search.
The whole thing will take you a while to read, but there’s a lot in there that makes it worth it. He points out that there are many articles duplicated, from multiple databases, that are free from one place and cost money from another. Make sure, before you buy an article, that there isn’t a free source for the same exact text.
Sometimes, even the same source will have it for free! NewspaperArchive.com appears to be trying to trick Google users, because if you click from a Google Archive result, you are taken to a preview page inviting you to sign up for their $6 a month membership, while simply hitting the search box on their site will get you the PDF for free!
Check it out:
Google News Archive search results (click the first result)
In addition, sometimes entering the text from Google’s search results snippet into Google’s regular web search engine can net you the article for free. Before you buy, always take a look.
Meanwhile, a few bloggers are noticing modern words that show up in very old articles. For example, Mahlon found “email” as far back as 1772, predating the invention of email by about two hundred years! Perhaps Google has uncovered evidence of George Washington’s Hotmail account? And Amit Agarwal found “Google” in a U.S. Supreme Court proceeding from 1759 (the Court was established in 1789, whoops!), “Microsoft” in a 1900 article about whether Linux was Enterprise Ready (double whoops!), “weblogs” in 1857, “Windows XP Media Center Edition” in 1985, and a 1926 Oakland Tribune article that says Firefox is “queer, indeed”.
It seems quite clear that there are dating errors and massive OCR errors in many of the databases Google is pulling from. If they want the News Archive to really work out, and to not receive complaints and calls for refunds, maybe Google should try to help the archives by giving them better software to manage their archives.
Visit the InsideGoogle blog.