Building a smarter Hacker News

With only 24 hours to build an app, it’s never easy to come up with a viable idea, nevermind build one that involves training a machine learning algorithm. Yet that’s exactly what a team of four friends accomplished this weekend at the annual TechCrunch Disrupt New York Hackathon.

The team — made up of twin brothers Daniel and David Robinson, Nathan Gould and Chris Riederer — built a website called Tagger News that takes a subset of Hacker News articles, analyzes the content using machine learning algorithms and applies appropriate subject tags. The team included two data scientists, a product architect at a blockchain company and a Columbia computer science PhD student. Oh, and David Robinson co-wrote a book called Text Mining with R for O’Reilly Media. Clearly they didn’t lack skills or brain power.

Tagger News website with custom tagging applied to articles

They came up with the idea because Daniel Robinson has always thought that although Hacker News offered a great way to share articles, it was difficult to discover content. He wanted to make it easier to find content related to a particular subject and Tagger News was born.

To speed things along in the 24-hour development process, the team divided up the work. David Robinson and Riederer trained the machine learning algorithms, while Daniel Robinson and Gould created the website, which is live now.

The challenge with their project was grabbing 25,000 articles from the Hacker News API, then training the scikit-learn Python machine learning algorithm to understand the content. The tags are generated automatically based on a machine learning algorithm called Random Forests, which recognizes combinations of words that belong to a given topic and groups them by subject.

Photo: Tagger News team

They used three of the four computers to grab the data from Hacker News and train the algorithms, which presented its own challenges when the Hacker News API timed out on occasion. Meanwhile, the web development team banged away on the fourth computer building the website.

In spite of the challenges, the team got everything working, and you can actually go to and click a tag to see all the articles associated with it. It’s worth mentioning that the four team members have been friends for some time, and this is their fifth TechCrunch Disrupt Hackathon.