Web applications require a lot of data storage. All the videos uploaded to YouTube, for example, are estimated to take up more than 500 terabytes of storage. Google’s servers overall process one petabyte of data every hour or so. Google had to create its own Web-scale file system to handle all the data that it processes and stores. As Web-scale computing and the needs of plain-old enterprise storage grow, many more companies are wishing they had a file system like Google’s.
Monday, a startup called ParaScale is launching a private beta of a commercial-grade storage software that uses a similar approach to Google’s own in-house system. (ParaScale nearly made it into TechCrunch50 this year, but was just shy of making the cut, largely because it was no longer in stealth mode). It offers a file system that can run on a cluster of any off-the-shelf Linux servers.
Companies can keep adding as many servers as they need, with each one acting as a redundant node. The software runs on the cluster as whole, treating it as one giant file system. This creates private cloud storage that companies can host themselves inside their own firewalls. ParaScale CEO Sajai Krishnan says customers can expect to pay about $1 per gigabyte, depending on their server costs.
That compares to 15 cents per gigabyte per month from Amazon’s S3 Web storage service, not counting what customers pay for inbound and outbound bandwidth. After about six months, a customer would end up paying more for Amazon S3.
But ParaScale’s private beta won’t be available to all comers. Krishnan is looking for about 20 initial beta customers (above the five alpha customers who have already been trying the software for the past 18 months) with serious storage needs. His ideal customer is:
. . . somebody with 30 terabytes of storage, growing at 10 to 20 terabytes a year. If you don’t have that, go with NetApp and you will be pretty happy.
Customers can apply for the beta trial here, and those that get the first four terabytes of storage management for free.
The kinds of applications that make sense for ParaScale include video hosting, applications that crawl the Web and create huge log files, or corporate databases that are simply getting out of hand. Maybe an enterprising enterprise customer will use ParaScale to set up its own storage cloud service to compete with S3. ParaScale itself is competing against Amazon and RackSpace on the hosted cloud storage side, and with storage appliance vendors such as NetApp and EMC on the data center side.
Architecturally, its approach is closer to Google’s MapReduce file system or Hadoop, the open-source version of that. Krishnan doesn’t rally see these as competition. He says:
They are okay for top-ten vendors who have the horsepower and Stanford grads to tune these things. It takes six months for a Google engineer to figure out the MapReduce mechanisms.
With ParaScale, one IT administrator can manage hundreds of server nodes running Parascale’s software. ParaScale raised $11.4 million in a series A round last May from Charles River Ventures and Menlo Ventures.