Twitter drafts a deepfake policy that would label and warn, but not always remove, manipulated media

Twitter last month said it was introducing a new policy to help fight deepfakes and other “manipulated media” that involve photos, videos or audio that’s been significantly altered to change its original meaning or purpose, or those that make it seem like something happened that actually did not. Today, Twitter is sharing a draft of its new policy and opening it up for public input before it goes live.

The policy is meant to address the growing problem with deepfakes on today’s internet.

Deepfakes have proliferated thanks to advances made in artificial intelligence that have made it easier to produce convincing fake videos, audio and other digital content. Anyone with a computer and internet connection can now create this sort of fake media. The technology can be dangerous when used as propaganda, or to make someone believe something is real which is not. In politics, deepfakes can be used to undermine a candidate’s reputation, by making them say and do things they never said or did.

A deepfake of Facebook CEO Mark Zuckerberg went viral earlier this year, after Facebook refused to pull down a doctored video that showed House Speaker Nancy Pelosi stumbling over her words was tweeted by Trump.

In early October, two members of the Senate Intelligence Committee, Mark Warner (D-VA) and Marco Rubio (R-FL), called on major tech companies to develop a plan to combat deepfakes on their platforms. The senators asked 11 tech companies — including Facebook, Twitter, YouTube, Reddit and LinkedIn — to come up with a plan to develop industry standards for “sharing, removing, archiving, and confronting the sharing of synthetic content as soon as possible.”

Twitter later in the month announced its plans to seek public feedback on the policy. Meanwhile, Amazon joined up with Facebook and Microsoft to support the DeepFake Detection challenge (DFDC), which aims to develop new approaches to detect manipulated media.

Today, Twitter is detailing a draft of its deepfakes policy. The company says that when it sees synthetic or manipulated media that’s intentionally trying to mislead or confuse people it will:

place a notice next to Tweets that share synthetic or manipulated media;

warn people before they share or like Tweets with synthetic or manipulated media; or

add a link – for example, to a news article or Twitter Moment – so that people can read more about why various sources believe the media is synthetic or manipulated.

Twitter says if a deepfake could threaten someone’s physical safety or lead to serious harm, it may also remove it.

The company is accepting feedback by way of a survey as well as on Twitter itself, by way of the #TwitterPolicyFeedback hashtag.

The survey asks questions like whether altered photos and videos should be removed entirely, have warning labels, or not be removed at all. And it asks whether certain actions are acceptable, like hiding tweets or alerting people if they’re about to share a deepfake. It also asks when it should remove a tweet with misleading media. The policy Twitter created says tweets will be removed if the tweet threatens someone’s physical safety, but will otherwise be labeled. The survey suggests some other times a tweet could be pulled — like if it threatens someone’s mental health, privacy, dignity, property and more.

The survey takes five minutes to complete and is available in English, Japanese, Portuguese, Arabic, Hindi and Spanish.

What isn’t clear, however, is how Twitter will be able to detect the deepfakes published on its platform, given that detection techniques aren’t perfect and often lag behind the newer and more advanced creation methods. On this front, Twitter invites those who want to partner with it on detection solutions to fill out a form.

Twitter is accepting feedback on its deepfakes policy from now until Wednesday, November 27 at 11:59 p.m. GMT. At that time, it will review the feedback received and make adjustments to the policy, as needed. The policy will then be incorporated into Twitter’s Rules with a 30-day notice before the change goes live.