text summarization

TextTeaser Lets Developers Integrate Text Summarization Into Their Apps And Sites

Next Story

Nielsen Preps Its Twitter TV Report, While Facebook Extends Data Partnership To Foreign Networks

TextTeaser is a service that creates tl;dr (too long; didn’t read) summaries for lengthly online articles. Available as a Web service and API on Mashape, TextTeaser’s developers also want to turn it into the “imgur of text summarization” by making a platform for users to upload summaries of their favorite articles. The service was created by Jolo Balbin as part of his graduate research.

Balbin programmed the adaptive algorithm that powers TextTeaser while working on his Masters degree in computer science at De La Salle University in Manila. Now a data scientist at job-matching site Bright.com, Balbin maintains TextTeaser with fellow programmer Ben Sarmiento. Balbin claims TextTeaser, which can summarize articles in any language written using the Roman alphabet, is more accurate than text summarization tools Cruxbot and Summly (which was purchased in March 2013 by Yahoo for a reported $30 million).

The bullet-pointed summaries created by TextTeaser can be shared as a link or image or embedded as HTML. Here is how it summarized Sean Parker’s June TechCrunch article, a response to media criticism about his wedding that was in turn criticized by the media for its 10,000-word length.

For other examples of TextTeaser in action, see this roundup of summaries or read what a TextTeaser-powered Reddit bot has produced for news summary subreddits.

Like other text summarization tools, TextTeaser’s algorithm sometimes misses out on important info, but I found that it’s handy for creating a quick abstract of key points before I delve into longer articles. Balbin envisions students using it to help study or developers integrating it into news aggregators and read-it-later services such as Instapaper or Pocket. One such site that has already started using TextTeaser is Bit Of News, which delivers daily email summaries of headline news.

Balbin first began working on a text summarization algorithm as an undergraduate and expanded on the project while pursuing his Masters. The algorithm looks at four things. First, it considers the relationship of the words in a title to the rest of the article. The second factor is sentence length and the third is sentence position (Balbin says the second sentence of an article is more important than the first because that is where most authors introduce key points). If certain words appear more frequently on the Web site an article appears on, then sentences with those words are given more weight.

TextTeaser’s API and documentation are currently available on Mashape and will be made open-source in the near future, says Balbin.