Colaboratory, or Colab for short, spun out from an internal Google Research project in late 2017. It’s designed to allow anyone to write and execute arbitrary Python code through a web browser, particularly code for machine learning, education and data analysis. For the purpose, Google provides both free and paying Colab users access to hardware including GPUs and Google’s custom-designed, AI-accelerating tensor processing units (TPUs).
In recent years, Colab has become the de facto platform for demos within the AI research community. It’s not uncommon for researchers who’ve written code to include links to Colab pages on or alongside the GitHub repositories hosting the code. But Google hasn’t historically been very restrictive when it comes to Colab content, potentially opening the door for actors who wish to use the service for less scrupulous purposes.
Not all code triggers the warning. This reporter was able to run one of the more popular deepfake Colab projects without issue, and Reddit users report that another leading project, FaceSwap, remains fully functional. This suggests enforcement is blacklist — rather than keyword —based, and that the onus will be on the Colab community to report code that runs afoul of the new rule.
“We regularly monitor avenues for abuse in Colab that run counter to Google’s AI principles, while balancing supporting our mission to give our users access to valuable resources such as TPUs and GPUs. Deepfakes were added to our list of activities disallowed from Colab runtimes last month in response to our regular reviews of abusive patterns,” a Google spokesperson told TechCrunch via email. “Deterring abuse is an ever-evolving game, and we cannot disclose specific methods as counterparties can take advantage of the knowledge to evade detection systems. In general, we have automated systems that detect and prohibit many types of abuse.”
Archive.org data shows that Google quietly updated the Colab terms sometime in mid-May. The previous restrictions on things like running denial-of-service attacks, password cracking and downloading torrents were left unchanged.
Deepfakes come in many forms, but one of the most common are videos where a person’s face has been convincingly pasted on top of another face. Unlike the crude Photoshop jobs of yesteryear, AI-generated deepfakes can match a person’s body movements, microexpressions and skin tones better than Hollywood-produced CGI in some cases.
Deepfakes can be harmless — even entertaining — as countless viral videos have shown. But they’re increasingly being used by hackers to target social media users in extortion and fraud schemes. More nefariously, they’ve been leveraged in political propaganda, for example to create videos of Ukrainian President Volodymyr Zelenskyy giving a speech about the war in Ukraine that he never actually gave.
From 2019 to 2021, the number of deepfakes online grew from roughly 14,000 to 145,000, according to one source. Forrester Research estimated in October 2019 that deepfake fraud scams would cost $250 million by the end of 2020.
“When it comes to deepfakes specifically, the issue that’s most relevant is an ethical one: dual use,” Vagrant Gautam, a computational linguist at Saarland University in Germany, told TechCrunch via email. “It’s a bit like thinking about guns, or chlorine. Chlorine is useful to clean stuff but it’s also been used as a chemical weapon. So we deal with that by first thinking about how bad the tech is and then, e.g., agree on the Geneva Protocol that we won’t use chemical weapons on each other. Unfortunately, we don’t have industry-wide consistent ethical practices regarding machine learning and AI, but it makes sense for Google to come up with its own set of conventions regulating the access to and ability to create deepfakes, especially since they’re often used to disinform and to spread fake news — which is a problem that’s bad and continues to get worse.”
Os Keyes, a Ph.D. candidate at Seattle University, also approved of Google’s move to ban deepfake projects from Colab. But he noted that more must be done on the policy side to prevent their creation and spread.
“The way that it has been done certainly highlights the poverty of relying on companies self-policing,” Keyes told TechCrunch via email. “Deepfake generation should absolutely not be an acceptable form of work, well, anywhere, and so it’s good that Google is not making itself complicit in that … But the ban doesn’t occur in a vacuum — it occurs in an environment where actual, accountable, responsive regulation of these kinds of development platforms (and companies) is lacking.”
Others, particularly those who benefitted from Colab’s previously laissez faire approach to governance, might not agree. Years ago, AI research lab OpenAI initially declined to open source a language-generating model, GPT-2, out of fear that it would be misused. This motivated groups like EleutherAI to leverage tools including Colab to develop and release their own language-generating models, ostensibly for research.
When I spoke to Connor Leahy, a member of EleutherAI, last year, he asserted that the commoditization of AI models is part of an “inevitable trend” in the falling price of the production of “convincing digital content” that won’t be derailed whether or not the code is released. In his view, AI models and tools should be made widely available so that “low-resource” users, especially academics, can gain access to better study and perform their own safety-focused research on them.
“Deepfakes have a large potential to run counter to Google’s AI principles. We aspire to be able to detect and deter abusive deepfake patterns versus benign ones, and will alter our policies as our methods progress,” the spokesperson continued. “Users wishing to explore synthetic media projects in a benign way are encouraged to talk to a Google Cloud representative to vet their use case and explore the suitability of other managed compute offerings in Google Cloud.”