Major Cloudflare bug leaked sensitive data from customers’ websites

Cloudflare revealed a serious bug in its software today that caused sensitive data like passwords, cookies, authentication tokens to spill in plaintext from its customers’ websites. The announcement is a major blow for the content delivery network, which offers enhanced security and performance for more than 5 million websites.

This could have allowed anyone who noticed the error to collect a variety of very personal information that is typically encrypted or obscured.

Remediation was complicated by an additional wrinkle. Some of that data was automatically cached by search engines, making it particularly difficult to clean up the aftermath as Cloudflare had to approach Google, Bing, Yahoo and other search engines and ask them to manually scrub the data.

The leak may have been active as early as Sept. 22, 2016, almost five months before a security researcher at Google’s Project Zero discovered it and reported it to Cloudflare.

However, the most severe leakage occurred between Feb. 13 and Feb. 18, when around 1 in every 3,300,000 HTTP requests to Cloudflare sites would have caused data to be exposed. Attackers could have accessed the data in real-time, or later through search engine caches.

Cloudflare notes in its announcement of the issue that even at its peak, data only leaked in about 0.00003% of requests. It doesn’t sound like much, but Cloudflare’s massive customer base includes categories like dating websites and password managers, which host particularly sensitive data.

“At the peak, we were doing 120,000 leakages of a piece of information, for one request, per day,” Cloudflare chief technology officer  John Graham-Cumming told TechCrunch. He emphasized that not all of those leakages would have contained secret information. “It’s random stuff in there because it’s random memory,” he said.

The bug occurred in an HTML parser that Cloudflare uses to increase website performance — it preps sites for distribution in Google’s publishing platform AMP and upgrades HTTP links to HTTPS. Three of Cloudflare’s features (email obfuscation, Server-side Excludes and Automatic HTTPS Rewrites) were not properly implemented with the parser, causing random chunks of data to become exposed.

Ultimately, even Cloudflare itself was affected by the bug. “One obvious piece of information that had leaked was a private key used to secure connections between Cloudflare machines,” Graham-Cumming wrote in Cloudflare’s announcement. The encryption key allowed the company’s own machines to communicate with each other securely, and was implemented in 2013 in response to concerns about government surveillance.

Graham-Cumming emphasized that Cloudflare discovered no evidence that hackers had discovered or exploited the bug, noting that Cloudflare would have seen unusual activity on their network if an attacker were trying to access data from particular websites.

“It was a bug in the thing that understands HTML,” Graham-Cumming explained. “We understand the modifications to web pages on the fly and they pass through us. In order to do that, we have the web pages in memory on the computer. It was possible to keep going past the end of the web page into memory you shouldn’t be looking at.”

Cloudflare’s teams in San Francisco and London handed off shifts to one another, working around the clock to fix the bug once it was reported. They had stopped the most severe issue within seven hours. It took six days for the company to completely repair the bug and to work with search engines to scrub the data.

Tavis Ormandy, an engineer at Google, first noticed the bug, which he jokingly called “Cloudbleed” in reference to the Heartbleed vulnerability. He said in a blog post that he encountered unexpected data during a project and wondered at first if there was a bug in his own code. Upon further testing, he realized the leak was coming from Cloudflare.

“We fetched a few live samples, and we observed encryption keys, cookies, passwords, chunks of POST data and even HTTPS requests for other major Cloudflare-hosted sites from other users,” Ormandy wrote. “This situation was unusual, [personally-identifiable information] was actively being downloaded by crawlers and users during normal usage, they just didn’t understand what they were seeing.” Ormandy added that he later destroyed the samples because of the sensitive information they contained, but he posted redacted screenshots of some of the information leaked from Uber, Fitbit and OkCupid.

Beyond the samples Ormandy collected, it’s not clear what other information may have leaked. “It’s very hard to say, because this information is transient,” Graham-Cumming said. But Ormandy says his samples revealed highly sensitive data.

“We keep finding more sensitive data that we need to cleanup. I didn’t realize how much of the internet was sitting behind a Cloudflare CDN until this incident,” Ormandy wrote. “I’m finding private messages from major dating sites, full messages from a well-known chat service, online password manager data, frames from adult video sites, hotel bookings. We’re talking full HTTPS requests, client IP addresses, full responses, cookies, passwords, keys, data, everything.”

Although Cloudflare worked with Ormandy to address the issue, he contends that the company’s final blog post on the matter “severely downplays the risk to customers.” Ormandy also expressed frustration that Cloudflare didn’t move faster in the remediation process.

But Graham-Cumming says it wouldn’t have been possible for Cloudflare to work any more quickly than it did. Graham-Cumming also says that Ormandy called Cloudflare’s disclosure “completely acceptable” when he reviewed a copy.

“This is subject to a 90 day disclosure. We were disclosing after six days,” Graham-Cumming said. “He’s saying he’s frustrated but I’m a little bemused at why he’s frustrated with six days rather than 90. We would have disclosed even earlier, but because some of this info had been cached, we thought we had a duty to clean that up before it became public. There was a danger that info would persist in search engines like Google.”

Graham-Cumming said that Cloudflare customers like Uber and OkCupid weren’t directly notified of the data leaks because of the security risks involved in the situation. “There was no backdoor communication outside of Cloudflare — only with Google and other search engines,” he said.