IBM’s UIMA Goes From Search To Concept

    December 19, 2005
    WebProNews Staff

The reason why people are better at answering questions than search engines is due to people understanding the concept behind a question; while search engines do well on context, IBM sees concepts as the next great advance in search technology.

If you remember something about a particular movie, a quote or a scene, but not the movie itself, a search engine may be very helpful in making a connection. But if it’s something very obscure and a Google or a Yahoo can’t find it, it isn’t a real big deal.

What if it’s your physician trying to find out more about symptoms of an illness you possess? Does finding data related to those symptoms, locating information based on the symptoms, become a big deal for you?

Most likely the answer will be yes. The idea of “concept” as the operative function in search has become the place where IBM wants to take enterprise search. For IBM, it’s stopped being about keywords.

It’s about discovering the concepts in content.

Marc Andrews, director for strategy & business development for content discovery, spoke to me about how discovery works, and it all starts with UIMA, Unstructured Information Management Architecture.

I asked him if UIMA was application aware as well as data aware, and he indicated it went beyond that. “It’s information aware, whatever format information comes in, wherever it’s stored across the enterprise, and it’s really going beyond the traditional search.”

Andrews noted how search solutions don’t really go beyond web and file servers when it comes to spidering. “They’ve really ignored all of the enterprise knowledge that is being managed in their content management environment; that’s being stored in databases supporting their different applications, and potentially even in mainframes.”

Content discovery offers a more holistic view of data across the enterprise. “We’ve been focusing on enabling organizations to do a lot more than just search,” Andrews said. “One of the major limitations organizations have today is they’re limited to keyword-based search capabilities. That ends up falling short of most organizations’ needs.”

UIMA, which has been in development for about three years, could fulfill those needs. Andrews said development of the architecture began about three years ago. “It started out as a project in IBM Research because we had over 200 researchers working across 8 labs in six or seven different countries.

“They were all developing different types of text analytics. We needed to be able to have those different components interoperate with and build upon each other,” Andrews said.

By developing UIMA, IBM created an open framework that could accept those text analytic components as plug-ins, interpret the meaning of unstructured information, and identify concepts and facts that allow for the search for more than mentions of words.

Now let’s come back to the scenario where you visit the doctor with a variety of symptoms, a list that has even the most talented of clinicians shaking his or her head. It’s an appropriate topic since the earliest adopters of UIMA have been organizations like the Mayo Clinic and Sloan-Kettering Cancer Center.

“An expert knows to go in and search for ten different ways of describing the symptom. Your typical patient or even doctor refers to things in different ways. They want to be able to search for a symptom or search for any drugs that relate to the symptom and find all of that information,” Andrews said.

“They’re doing it for clinical trials research and drug research, to be able to identify concepts and facts. So those organizations are leveraging UIMA today to incorporate these types of analytics.

“Doctors and patients wanted to find out more about different clinical trials that are going on,” he continued. “These clinical trials are being managed by various organizations, information is scattered across NIH (National Institute of Health) and other different databases, and included clinical trials being conducted” by various pharmaceutical firms.

Medical centers and universities like Stanford, Carnegie-Mellon, and Columbia were running into search-related challenges. So was the organization that gave birth to the Internet, the Defense Advanced Research and Projects Agency (DARPA).

It was DARPA that sponsored the first working group for UIMA. From that working group, the first version of UIMA became available at the beginning of 2005. In August 2005, IBM announced its intent to make UIMA available as open source. Andrews said there have been about 3,000 downloads of the UIMA framework.

There’s some proof-of-concept work going on at some Fortune 500 companies with UIMA, but Andrews wasn’t in a position to discuss names yet. More information on that should be coming early in 2006.

Email the author here.

Add to | Yahoo My Web

David Utter is a staff writer for WebProNews covering technology and business.