Harvard to Offer Book Metadata for SearchBy: Mike Fossum - April 24, 2012
Harvard University is set to release the metadata of over 12 million books, audio recordings, manuscripts, videos, images, maps and other media that presently sit inside its 73 libraries. To clarify, only metadata will be disclosed, as the entire contents of the articles are protected under various intellectual property restrictions. The information will be available for download from Harvard, as well as the Digital Public Library of America.
The metadata consists of titles, recording and publishing dates, video descriptions, book lengths, etc., and the descriptions alone can be more relevant to search engines regardless, which still rely on metadata when compiling results. David Weinberger, co-director of Harvard’s Library Lab, states, “This is Big Data for books – There might be 100 different attributes for a single object.” Weinberger adds that during a trial run over the course of a day, moderators combed through 600,000 library items, and found that users had created visual timelines on the broad publishing of ideas, as well as “virtual stacks,” showing the locations of various volumes.
Harvard also plans to add circulation information to the database, though Stuart Shieber, Director of Harvard’s Office of Scholarly Communication, points out – “We have to be careful how we do that, to avoid releasing any personal information.” Obviously, this sort of security breach might shut the entire project down, similar to what has been going on lately with ICANN’s generic top-level domain application system.
Shieber, who led the project, adds, “This data serves to link things together in ways that are difficult to predict – The more information you release, the more you see people doing innovative things.” Harvard hopes that other libraries will follow its lead, to expand upon this new and unique wealth of intellectual information. The concept of “big data” has been changing the music industry as of late, with companies like musical intelligence platform The Echo Nest capitalizing on the vast accumulations of information compiled on the web, to better align the concept of a sort of culture via content. Harvard’s release of the metadata will likely facilitate a similar new and larger understanding of vast pools of knowledge.