After getting hauled into Washington to be grilled over how its social media platform was exploited to influence public opinion around elections and politics in the US and elsewhere, Twitter vowed that it would be more open with its data, in an attempt to do better in the future — the idea partly being that others can provide more insight into what nefarious groups did on Twitter, and partly people will not mistrust Twitter in its own intentions to keep this off its platform. Now it’s coming good on some of that.
Today, Twitter released a set of data files detailing Tweets and other actions taken by more than 4,500 accounts on the site linked to state-backed information operations, specifically: 3,841 accounts associated with Russia’s Internet Research Agency and 770 other accounts “potentially originating in Iran.”
While Twitter had revealed the account numbers previously, this is the first time that it’s unveiling actual Tweets and more from the data trove behind them. The files total more than 360 gigabytes and include more than 10 million Tweets; more than 2 million images, GIFs, videos, and Periscope broadcasts; and a list of datapoints detailing more about those accounts and their Tweets — how many followers and who they followed; the geolocation of their Tweets; polls that were run and more.
The trove goes back as far as these accounts do. In some cases, certain accounts go back as far as 2009, Twitter said. That in itself is very interesting: it could be a measure of how nefarious people were hijacking dormant accounts, or a measure of the long game that the most malicious groups play.
These files are for those who want to take a peek into just what groups like Russia’s Internet Agency and people out of Iran got up to on Twitter, but especially for those who might be able to map out and make better sense of the data than Twitter has done up to now.
“It is clear that information operations and coordinated inauthentic behavior will not cease,” write Vijaya Gadde and Yoel Roth, respectively Twitter’s legal, policy and trust and safety lead, and head of site integrity. “These types of tactics have been around for far longer than Twitter has existed — they will adapt and change as the geopolitical terrain evolves worldwide and as new technologies emerge. For our part, we are committed to understanding how bad-faith actors use our services. We will continue to proactively combat nefarious attempts to undermine the integrity of Twitter, while partnering with civil society, government, our industry peers, and researchers to improve our collective understanding of coordinated attempts to interfere in the public conversation.”
Twitter notes that the data does not include deleted Tweets prior to suspension, although it also said that these would account for less than 1 percent of overall activity.
Going forward, Twitter said that it plans to release similar datasets as and when it identifies them, “in a timely fashion after we complete our investigations,” and also may release incremental datasets if they appear to be significant and materially impacting.
In cases where accounts have less than 5,000 followers, Twitter said that it has hashed identifying fields like user ID and screen name in the public files linked today. “While we’ve taken every possible precaution to ensure there are no false positives in these datasets, we’ve hashed these fields to reduce the potential negative impact on real or compromised accounts — while still enabling longitudinal research, network analysis, and assessment of the underlying content created by these accounts.”
To that end, it’s also including a form for people to fill out if they feel they’ve been included in error.