Spotting sockpuppets with science

If you’ve ever ventured into the comment section of a website or spent any time on forums or social media, you’ve probably encountered sockpuppets, fake accounts controlled by a single person — though it’s possible you didn’t know it at the time. New research may help ID these overeager commentators automatically, which is good news for engendering sane discussion across the web.

Srijan Kumar from the University of Maryland led a team in statistically analyzing everything about sockpuppet accounts, from how they write and interact with each other to the user names with which they’re registered. Their findings were presented this week at the World Wide Web Conference in Perth.

The data came from sites that use Disqus as a commenting platform; the company provided “a complete trace of user activity across nine communities that consisted of 2,897,847 users, 2,129,355 discussions, and 62,744,175 posts.”

They found some aspects of sockpuppets that are interesting on their own, but also helpful in identifying them. The accounts tend to be active around the same time and in the same threads, but seldom start new discussions. Their user names vary widely, but the account emails are often almost identical. And they have certain linguistic characteristics that set them apart from normal users: more “I” and “you,” and generally worse grammar. And they’re mostly focused on current affairs:

By measuring these (and dozens more) factors, the team was able to identify whether an account was a sockpuppet or not about two-thirds of the time — but more interestingly, it was 91 percent accurate in determining whether two accounts belonged to the same “puppetmaster.”

In the illustration at top you see a visualization of the comments at AV Club; blue dots are users and red ones are sockpuppets, which tend to cluster together because of their more frequent interactions. They’re also more central because of their greater activity than ordinary users.

It’s quite a distance from an automated sockpuppet unmasker, but this data (no doubt shared in kind with Disqus in addition to being published) should help moderators and admins make more informed decisions when trying to make sense of the chaos that is online discourse. Soon it might even be safe to read the comments… well, probably not.

If you’re curious about the other aspects of sockpuppetry unearthed by the study, you can check out the full version of the paper here — apart from the statistics, it’s quite readable.