What Exactly Is GitHub Anyway?

Andreessen Horowitz announced a whopping $100 million investment in GitHub this week. You can read commentary and speculation all over the web about what GitHub will do with the money, whether this was a good investment for Andreessen Horowitz and whether taking such a large investment is a good thing for GitHub.

But what the heck is GitHub and why are developers so excited about it? You may have heard that GitHub is a code sharing and publishing service, or that it’s a social networking site for programmers. Both statements are true, but neither explain exactly why GitHub is special.

At the heart of GitHub is Git, an open source project started by Linux creator Linus Torvalds. Matthew McCullough, a trainer at GitHub, explains that Git, like other version control systems, manages and stores revisions of projects. Although it’s mostly used for code, McCullough says Git could be used to manage any other type of file, such as Word documents or Final Cut projects. Think of it as a filing system for every draft of a document.

Some of Git’s predecessors, such as CVS and Subversion, have a central “repository” of all the files associated with a project. McCullough explains that when a developer makes changes, those changes are made directly to the central repository. With distributed version control systems like Git, if you want to make a change to a project you copy the whole repository to your own system. You make your changes on your local copy, then you “check in” the changes to the central server. McCullough says this encourages the sharing of more granular changes since you don’t have to connect to the server every time you make a change.

GitHub is a Git repository hosting service, but it adds many of its own features. While Git is a command line tool, GitHub provides a Web-based graphical interface. It also provides access control and several collaboration features, such as a wikis and basic task management tools for every project.

The flagship functionality of GitHub is “forking” – copying a repository from one user’s account to another. This enables you to take a project that you don’t have write access to and modify it under your own account. If you make changes you’d like to share, you can send a notification called a “pull request” to the original owner. That user can then, with a click of a button, merge the changes found in your repo with the original repo.

These three features – fork, pull request and merge – are what make GitHub so powerful. Gregg Pollack of Code School (which just launched a class called TryGit) explains that before GitHub, if you wanted to contribute to an open source project you had to manually download the project’s source code, make your changes locally, create a list of changes called a “patch” and then e-mail the patch to the project’s maintainer. The maintainer would then have to evaluate this patch, possibly sent by a total stranger, and decide whether to merge the changes.

This is where the network effect starts to play a role in GitHub, Pollack explains. When you submit a pull request, the project’s maintainer can see your profile, which includes all of your contributions on GitHub. If your patch is accepted, you get credit on the original site, and it shows up in your profile. It’s like a resume that helps the maintainer determine your reputation. The more people and projects on GitHub, the better idea picture a project maintainer can get of potential contributors. Patches can also be publicly discussed.

Even for maintainers who don’t end up using the GitHub interface, GitHub can make contribution management easier. “I end up just downloading the patch anyway, or merging from the command line instead of from the merge button,” says Isaac Schlueter, the maintainer of the open source development platform Node.js. “But GitHub provides a centralized place where people can discuss the patch.”

Lowering the barrier to entry democratizes open source development, and helps young projects grow. “Node.js wouldn’t be what it is today without GitHub,” Schlueter says.

Besides its public facing open source repositories, GitHub also sells private repositories and on-premise instances of its software for enterprises. These solutions obviously can’t take full advantage of GitHub’s network effect, but they can take advantage of the collaboration features. That’s how GitHub makes money, but it’s not alone in this market.

Atlassian acquired a competitor called BitBucket in 2010. And earlier this year Atlassian launched Stash, a product that enables you to host private, on-premise Git repositories with BitBucket/GitHub-style collaboration features. The company also sells developer collaboration tools like the bug tracker Jira and the wiki Confluence. Competition from Atlassian, which took $60 million in funding from Accel Partners in 2010, could help explain why GitHub took this round of funding, and hint at some possible future directions for the company. For example, Schlueter says GitHub’s issue tracking feature could eventually compete with JIRA for some projects.

The money may be in private and on-premise hosting, but the love is in the public repositories. Perhaps most importantly, GitHub has become the Library of Alexandria for code examples. Since Git encourages granular recording of changes, programmers, be they absolute beginners or experts, can trace the steps of some of the greatest developers in the world and find out how they solved thorny problems. But if GitHub were ever to meet the same fate as the Library of Alexandria, it could be reconstructed from all those local forks distributed on so many developers laptops all over the world. Regardless of how this investment works out, that’s a hell of a legacy for the GitHub team to leave behind.