Deepfakes, AI-generated porn and a thousand more innocent uses — there’s been a lot of news about neural network-generated images. It makes sense that people started getting curious; were my photos used to train the robots? Are photos of me in the image-generating training sets? A brand new site tries to give you an answer.
Spawning AI creates image-generation tools for artists, and the company just launched Have I Been Trained? which you can use to search a set of 5.8 billion images that have been used to train popular AI art models. When you search the site, you can search through the images that are the closest match, based on the LAION-5B training data, which is widely used for training AI search terms.
It’s a fun tool to play with, and may help give a glimpse into the data that the AI is using as the basis for its own. The photo at the top of this post is a screenshot of the search term “couple”. Try putting your own name in, and see what happens… I also tried a search for “Obama,” which I will not be sharing a screenshot of here, but suffice it to say that these training sets can be… Problematic.
An Ars Technica report this week reveals that private medical records — as many as thousands — are among the many photos hidden within LAION-5B with questionable ethical and legal statuses. Removing these records is exceptionally difficult, as LAION isn’t a collection of files itself but merely a set of URLs pointing to images on the web.
In response, technologists like Mat Dryhurst and Holly Herndon are spearheading efforts such as Source+, a standard aiming to allow people to disallow their work or likeness to be used for AI training purposes. But these standards are — and will likely remain — voluntary, limiting their potential impact.