LinkedIn Open-Sources FeatureFu, A Toolkit For Building Machine Learning Models

LinkedIn today announced that it is open-sourcing an internal tool called FeatureFu. The FeatureFu toolkit is meant to make it easier for developers to build their machine learning models around statistical modeling and decision engines.

The idea here is to take LinkedIn’s knowledge around “feature engineering” and make it accessible to developers outside of the company. In machine learning, feature engineering is basically using your detailed knowledge of the phenomenon you are looking at and then using that to build machine learning models.

LinkedIn argues that most large-scale recommendation systems (think LinkedIn’s own tools for suggesting connections on its site) are managed by at least two teams: one that handles the offline modeling and one that takes care of the online feature-serving/model-scoring part of the system. This leads to a number of problems FeatureFu is trying to solve.

“Many large-scale recommendation systems are brittle and vulnerable. FeatureFu allows for creative and agile development on these systems so that shipping new features doesn’t take weeks or even months,” LinkedIn senior software engineer Bing Zhao told me.

A small change in how features are generated can create lots of work for the other team, for example, and also makes it hard to experiment with different feature/model techniques.

FeatureFu uses a small Java library called Expr that developers can use to transform and build upon an existing set of features. “Once deployed to an online feature generation framework, it eliminates any further need for code-change to ship models for a wide range of derived features,” as Zhao describes the advantages of this system.

So why did LinkedIn decide to open source this tool? “When we have a business need for software, we first look to see if there are pre-existing software projects in open source. If there isn’t then we create it ourselves,” Zhao told me. “As long as the software isn’t a business differentiator, then we often open source the project so that many can benefit.”

Zhao also noted that he hopes for FeatureFu to become widely adopted. “FeatureFu could become a common technique for many machine learning systems,” he said. “It enables feature engineering to be more agile, which is one of the keys to success for machine learning applications. So we wanted to share our work with the industry.”