Cloudera pulls sensitive files from its ‘open by design’ cloud servers

Enterprise cloud giant Cloudera has pulled several of its cloud storage servers offline, despite initially claiming the servers were “open by design,” after a security researcher found sensitive internal files inside.

Chris Vickery, director of risk research at security firm UpGuard, found the cloud storage servers — known as buckets — hosted on Amazon Web Services in late July. The data largely contained legacy Hortonworks data from prior to its $5.2 billion all-stock merger with Cloudera in January 2019.

When reached, Cloudera spokesperson Madge Miller told TechCrunch that the buckets were supposed to be open and contained files and code that were open to its customers, users and the wider community. The company said, however, that it identified three files that contained confidential information, which were removed from the buckets.

But soon after, the company reversed its position and pulled the buckets offline altogether.

Vickery, who shared his findings exclusively with TechCrunch, said that although the vast majority of files in the cloud buckets were for public and community consumption, he also found files containing credentials, account access tokens, passwords and other secrets for Cloudera’s internal Jenkins system, which the company uses for building and testing its software projects. The buckets also contained entire SQL databases for its internal build databases, Vickery said.

A “secrets” file containing passwords and credentials for Cloudera’s internal systems. (Image: UpGuard/supplied)

Cloudera confirmed the security lapse in a later email to TechCrunch.

“Thanks to the questions from the security researcher, we did a deep dive and found some credentials and SQL dumps in the public buckets which should not have been placed there. The credentials were for our internal Jenkins build process and the SQL dumps were of our build database,” the spokesperson said.

“We have since removed this information from the public buckets and taken further remediation steps by changing credentials and rotating keys. We also concluded we could close access to a few unused publicly accessible buckets.”

The company said that the sensitive data, since removed, did not contain any customer data or any other personally identifiable information.

In all, the security lapse could have been worse — even if the incident could have been avoided altogether.

But Vickery said the incident was important to disclose as it reveals the inherent risk in using overwhelmingly large cloud storage containers. In other words, the buckets were so big and had so many files that it becomes nearly impossible to notice when something sensitive is added to the bucket by mistake.

“When that many directories and files of varying format are all stashed away together, it becomes all too easy for something to be mistakenly put among them and remain unnoticed, as is what appears to have happened here,” wrote Vickery.