Skip to content

OPLIN 4Cast #243: Natural language Question Answering

Posted in 4cast

An article caught our attention last week concerning some additions to Gartner Research’s Hype Cycle for Emerging Technologies. If you’re not familiar with the Hype Cycles, the Gartner website has a good explanation, but what they basically do is predict the life cycle of technologies as they move through inflated expectations to eventual productivity. One of the emerging technologies in the current hype cycle is natural language Question Answering (QA), which Gartner predicts will reach mainstream adoption in 5-10 years. For libraries, this is reminiscent of the plot of that old movie Desk Set; will computers – similar to the IBM Watson system that recently competed on Jeopardy! – soon be replacing reference librarians?

  • How does QA technology compare to document search? (IBM DeepQA Project FAQ)  “The key difference between QA technology and document search is that document search takes a keyword query and returns a list of documents, ranked in order of relevance to the query, while QA technology takes a question expressed in natural language, seeks to understand it in much greater detail, and returns a precise answer to the question.”
  • What is artificial intelligence? (New York Times Opinion/Richard Powers)  “Open-domain question answering has long been one of the great holy grails of artificial intelligence.[…] It goes well beyond what search engines like Google do when they comb data for keywords. Google can give you 300,000 page matches for a search of the terms ‘greyhound,’ ‘origin’ and ‘African country,’ which you can then comb through at your leisure to find what you need. Asked in what African country the greyhound originated, Watson can tell you in a couple of seconds that the authoritative consensus favors Egypt.”
  • Katz explains contributions to Watson Jeopardy! challenge (MIT CSAIL News/Abby Abazorius)  “[Principal Research Scientist Boris] Katz’s model of syntactic decomposition helps Watson decipher complex, multi-pronged questions by allowing the system to understand that it needs to tackle several sub-questions. The system then uses an algorithm that helps it decide which sub-questions to answer and in what order, and compiles the gathered information into a cohesive, and hopefully correct, answer.”
  • An analysis of the AskMSR question-answering system (Microsoft Research/Eric Brill et. al.) [pdf]  “Typically, when deploying a question answering system, there is some cost associated with returning incorrect answers to a user. Therefore, it is important that a QA system has some idea as to how likely an answer is to be correct, so it can choose not to answer rather than answer incorrectly.[…] Ideally, we would like to be able to determine the likelihood of answering correctly solely from an analysis of the question.”

Watson fact:
The IBM Watson computer system used on Jeopardy! had 200 million pages of information stored in its memory, including the full text of Wikipedia.