Greplin: 1.5 Billion Documents Indexed, Six Engineers

Late last year we first mentioned Y Combinator startup Greplin – it’s a startup that indexes your social stuff in the cloud, making all your Facebook, Gmail. LinkedIn, Google Calendar, Evernote, Twitter, Dropbox and just about everything else searchable. The easiest way to describe it is “the other half of search.”

They opened their doors to customers in February. The company won’t talk about total user numbers yet, which isn’t surprising. But we have dug one interesting data point out of founder Daniel Gross – They’ve now indexed some 1.5 billion documents. And they’re indexing about 30 million new documents per day.

What this means – when you join Greplin you authorize it to index various social apps and services. A typical user may sign up and start off by authorizing Greplin to index Facebook, Twitter and Gmail, for example. Greplin then grabs everything in those services – all your Facebook messages and updates, all your Twitter updates and DMs, all your Gmail messages back and forth, etc. , and lets you search them. When you add up all those documents for all users, you get to that big number, 1.5 billion.

To put this into perspective, that’s about the size of Google’s web-wide index in 2001. Or 60 times the size of Google’s original 1998 index of 25 million documents.

On the daily side, Greplin’s 30 million new documents a day is about 25% of Twitter’s current load (and Twitter gets off easy with 140 character documents). It’s not an apples to apples comparison, but it gives you some idea of the scale that they’re already reaching. And remember, they launched in February.

And all that with just six engineers and one support person, says Gross. He has Amazon web services to thank for that, although the recent outage didn’t make him too happy.