Archive-It Collects Keyword Search

    December 27, 2007
    WebProNews Staff

The goals of the Archive-It project, courtesy of the Internet Archive, will be assisted with the addition of keyword search.

The Archive-It service has the Library of Congress and the US National Archives among its partners. Archive-It’s collections, established by such partner institutions, are curated by those partners.

Each collection focuses deeply on a topic; recent collections like one about the devastating wildfires in Souther California provide an archive of views of the event, from a variety of perspectives.

Finding something in particular within a collection became easier recently. Resource Shelf noted the addition of an advanced search interface that includes keyword search.

Those who routinely use Internet search engines already understand the utility of keyword searching. Until this new change at Archive-It, the collections lacked that functionality; in fact, the Internet Archive does not have keyword searching yet.

The advanced search, powered by the open source Nutch platform, allows searchers to filter results by query terms, host URL, file format, or by specific partners.

Another feature provides a future benefit. These queries may be saved as RSS feeds. As updates happen related to the query, the feed will deliver the newest results to one’s feedreader.

“It is a fallacy that if something is on the web, it will be there forever,” Archive-It said in its mission statement. “The average lifespan of a web page is 44 -75 days.”

“What is here today might be gone tomorrow,” they continued, specifying the need for collections. Archive-It’s collected projects may be browsed as well as searched, as people may want to have a better idea of what to search for before they do a query.