2010-10-12

NELL: Never-Ending Language Learning -- A Computer System that Learns to Read The Web

Smarter Than You Think - Aiming to Learn as We Do, A Machine Teaches Itself - NYTimes.com



Since the start of the year, a team of researchers at Carnegie Mellon University — supported by grants from the Defense Advanced Research Projects Agency and Google, and tapping into a research supercomputing cluster provided by Yahoo — has been fine-tuning a computer system that is trying to master semantics by learning more like a human. Its beating hardware heart is a sleek, silver-gray computer — calculating 24 hours a day, seven days a week — that resides in a basement computer center at the university, in Pittsburgh. The computer was primed by the researchers with some basic knowledge in various categories and set loose on the Web with a mission to teach itself.

The Never-Ending Language Learning system, or NELL, has made an impressive showing so far. NELL scans hundreds of millions of Web pages for text patterns that it uses to learn facts, 390,000 to date, with an estimated accuracy of 87 percent. These facts are grouped into semantic categories — cities, companies, sports teams, actors, universities, plants and 274 others. The category facts are things like “San Francisco is a city” and “sunflower is a plant.”



  Can computers learn to read? We think so. "Read the Web" is a research project that attempts to create a computer system that learns over time to read the web. Since January 2010, our computer system called NELL (Never-Ending Language Learner) has been running continuously, attempting to perform two tasks each day:
  • First, it attempts to "read," or extract facts from text found in hundreds of millions of web pages (e.g., playsInstrument(George_Harrison, guitar)).
  • Second, it attempts to improve its reading competence, so that tomorrow it can extract more facts from the web, more accurately.
As of October 2010, NELL has acquired a knowledge base of nearly 440,000 beliefs that it has read from various web pages. It is not perfect, but NELL is learning. You can track NELL's progress, browse and download its knowledge base, and read more about our technical approach.