Twitter runs a test prompting users to revise 'harmful' replies

In its latest effort to deal with rampant harassment on its platform, Twitter will look into giving users a second chance before they tweet. In a new feature the company is testing, users who use “harmful” language will see a prompt suggesting that they self-edit before posting a reply.

When things get heated, you may say things you don't mean. To let you rethink a reply, we’re running a limited experiment on iOS with a prompt that gives you the option to revise your reply before it’s published if it uses language that could be harmful.

— Twitter Support (@TwitterSupport) May 5, 2020

The framing here is a bit disingenuous — harassment on Twitter certainly doesn’t just happen in the “heat of the moment” by otherwise well-meaning individuals — but anything that can reduce toxicity on the platform is probably better than what we’ve got now.

In response to questions from TechCrunch, Twitter confirmed that the test feature will be limited to replies only for now. Twitter also explained that its systems will detect harmful language based on the kind of language used in other tweets that have been reported.

Last year at F8, Instagram rolled out a similar test for its users that would “nudge” them with a warning before they post a potentially offensive comment. In December, the company offered an update on its efforts. “Results have been promising, and we’ve found that these types of nudges can encourage people to reconsider their words when given a chance,” the company wrote in a blog post.

This kind of thing is particularly relevant right now, as companies conduct moderation across their massive platforms with relative skeleton crews. All of the major social networks have announced an increased reliance on AI detection as the pandemic keeps tech workers away from the office. In Facebook’s case, content moderators are among the employees they’d like to bring back first.

The pandemic is already reshaping tech’s misinformation crisis