Google adds generative AI threats to its bug bounty program

Google has expanded its vulnerability rewards program (VRP) to include attack scenarios specific to generative AI.

In an announcement shared with TechCrunch ahead of publication, Google said: “We believe expanding the VRP will incentivize research around AI safety and security and bring potential issues to light that will ultimately make AI safer for everyone,”

Google’s vulnerability rewards program (or bug bounty) pays ethical hackers for finding and responsibly disclosing security flaws.

Given that generative AI brings to light new security issues, such as the potential for unfair bias or model manipulation, Google said it sought to rethink how bugs it receives should be categorized and reported.

The tech giant says it’s doing this by using findings from its newly formed AI Red Team, a group of hackers that simulate a variety of adversaries, ranging from nation-states and government-backed groups to hacktivists and malicious insiders to hunt down security weaknesses in technology. The team recently conducted an exercise to determine the biggest threats to the technology behind generative AI products like ChatGPT and Google Bard.

The team found that large language models (or LLMs) are vulnerable to prompt injection attacks, for example, whereby a hacker crafts adversarial prompts that can influence the behavior of the model. An attacker could use this type of attack to generate text that is harmful or offensive or to leak sensitive information. They also warned of another type of attack called training-data extraction, which allows hackers to reconstruct verbatim training examples to extract personally identifiable information or passwords from the data.

Both of these types of attacks are covered in the scope of Google’s expanded VRP, along with model manipulation and model theft attacks, but Google says it will not offer rewards to researchers who uncover bugs related to copyright issues or data extraction that reconstructs non-sensitive or public information.

The monetary rewards will vary on the severity of the vulnerability discovered. Researchers can currently earn $31,337 if they find command injection attacks and deserialization bugs in highly sensitive applications, such as Google Search or Google Play. If the flaws affect apps that have a lower priority, the maximum reward is $5,000.

Google says that it paid out more than $12 million in rewards to security researchers in 2022.