A massive database of current U.S. Immigration and Customs Enforcement (ICE) employees scraped from public LinkedIn profiles has been removed from the tech platforms hosting the data. The project was undertaken by Sam Lavigne, self-described artist, programmer and researcher in response to recent revelations around ICE’s detention practices at the southern U.S. border.
Lavigne posted the database to GitHub on Tuesday and by Wednesday the repository had been removed. The database included the name, profile photo, title and city area of every ICE employee who listed the agency as their employer on the professional networking site. A more in-depth version of the data pulled all public LinkedIn data from the pool of users, including previous employment, education history and any other information those users opted to make public. The total database lists this information for 1,595 ICE employees, from the agency’s CTO on down to low-level workers and interns.
The project accompanied a Medium post about the project’s aims that has since been removed by the platform:
While I don’t have a precise idea of what should be done with this data set, I leave it here with the hope that researchers, journalists and activists will find it useful…
I find it helpful to remember that as much as internet companies use data to spy on and exploit their users, we can at times reverse the story, and leverage those very same online platforms as a means to investigate or even undermine entrenched power structures. It’s a strange side effect of our reliance on private companies and semi-public platforms to mediate nearly all aspects of our lives.
The data set appears to have violated GitHub and Medium guidelines against doxing. Medium’s anti-harassment policy specifically forbids doxing and defines it broadly, preventing “the aggregation of publicly available information to target, shame, blackmail, harass, intimidate, threaten, or endanger.”
Because it doesn’t include personal identifying information like home addresses, phone numbers or other non-public details, Lavigne’s project isn’t really doxing in the normal sense of the word, though that hasn’t made it less controversial.
GitHub’s own policy leading to the data’s removal is less clear, though the company told The Verge the repository was removed due to “doxxing and harassment.” The platform’s terms of service forbid uses of GitHub that “violate the privacy of any third party, such as by posting another person’s personal information without consent.” This leaves some room for interpretation, and it is not clear that data from a public-facing social media profile is “personal” under this definition. GitHub allows researchers to scrape data from external sites in order to aggregate it “only if any publications resulting from that research are open access.”
While Lavigne’s aggregation efforts were deemed off-limits by some tech platforms, they do raise compelling questions. What kinds of public data, in aggregate, run afoul of anti-harassment rules? Why can this kind of data be scraped for the purposes of targeted advertising or surveillance by law enforcement but not be collected in a user-facing way? The ICE database raised these questions and plenty more, but for some tech companies the question of hosting the data proved too provocative from the start.