AI tradeoffs: Balancing powerful models and potential biases

Andrea Gagliano Contributor

Andrea Gagliano is head of data science at Getty Images.

The sneaky origins of bias

Today’s AI models are often pre-trained and open source, which allows researchers and companies alike to implement AI quickly and tailor it to their specific needs.

While this approach makes AI more commercially available, there’s a real downside — namely, that a handful of models now underpin the majority of AI applications across industries and continents. These systems are burdened by undetected or unknown biases, meaning developers who adapt them for their applications are working from a fragile foundation.

According to a recent study by Stanford’s Center for Research on Foundation Models, any biases within these foundational models or the data upon which they’re built are inherited by those using them, creating potential for amplification.

For example, YFCC100M is a publicly available data set from Flickr that is commonly used to train models. When you examine the images of people within this data set, you’ll see that the distribution of images around the world is heavily skewed toward the U.S., meaning there’s a lack of representation of people from other regions and cultures.

These types of skews in training data result in AI models that have under- or overrepresentation biases in their output — i.e., an output that is more dominant for white or Western cultures. When multiple data sets are combined to create large sets of training data, there is a lack of transparency, and it can become increasingly difficult to know if you have a balanced mix of people, regions and cultures. It’s no surprise that the resulting AI models are published with egregious biases contained therein.

Further, when foundational AI models are published, there is typically little to no information provided around their limitations. Uncovering potential issues is left to the end user to test — a step that is often overlooked. Without transparency and a complete understanding of a particular data set, it’s challenging to detect the limitations of an AI model, such as lower performance for women, children or developing nations.

At Getty Images, we evaluate whether bias is present in our computer vision models with a series of tests that include images of real, lived experiences, including people with varying levels of abilities, gender fluidity and health conditions. While we can’t catch all biases, we recognize the importance of visualizing an inclusive world and feel it’s important to understand the ones that may exist and confront them when we can.

Leveraging metadata to mitigate biases

So, how do we do this? When working with AI at Getty Images, we start by reviewing the breakdown of people across a training data set, including age, gender and ethnicity.

Fortunately, we’re able to do this because we require a model release for the creative content that we license. This allows us to include self-identified information in our metadata (i.e., a set of data that describes other data), which enables our AI team to automatically search across millions of images and quickly identify skews in the data. Open source data sets are often limited by a lack of metadata, a problem that is exacerbated when combining data sets from multiple sources to create a larger pool.

But let’s be realistic: Not all AI teams have access to expansive metadata, and ours isn’t perfect either. An inherent tradeoff exists — larger training data that leads to more powerful models at the expense of understanding skews and biases in that data.

As an AI industry, it’s crucial that we find a way to overcome this tradeoff given that industries and people globally depend upon it. The key is increasing our focus on data-centric AI models, a movement beginning to take stronger hold.

Where do we go from here?

Confronting biases in AI is no small feat and will take collaboration across the tech industry in the coming years. However, there are precautionary steps that practitioners can take now to make small but notable changes.

For example, when foundational models are published, we could release the corresponding data sheet describing the underlying training data, providing descriptive statistics of what is in the data set. Doing so would provide subsequent users with a sense of a model’s strengths and limitations, empowering them to make informed decisions. The impact could be huge.

The aforementioned study on foundational models poses the question, “What is the right set of statistics over the data to provide adequate documentation, without being too costly or difficult to obtain?” For visual data specifically, researchers would ideally provide the distributions of age, gender, race, religion, region, abilities, sexual orientation, health conditions and more. But, this metadata is costly and difficult to obtain on large data sets from multiple sources.

A complementary approach would be for AI developers to have access to a running list of known biases and common limitations for foundational models. This could include developing a database of easily accessible tests for biases that AI researchers could regularly contribute to, especially given how people use these models.

For example, Twitter recently facilitated a competition that challenged AI experts to expose biases in their algorithms (Remember when I said that recognition and awareness are key toward mitigation?). We need more of this, everywhere. Practicing crowdsourcing like this on a regular basis could help reduce the burden on individual practitioners.

We don’t have all of the answers yet, but as an industry, we need to take a hard look at the data we are using as the solution to more powerful models. Doing so comes at a cost –- amplifying biases — and we need to accept the role we play within the solution. We need to look for ways to more deeply understand the training data we are using, especially when AI systems are used to represent or interact with real people.

This shift in thinking will help companies of all types and sizes quickly spot skews and counteract them in the development stage, dampening the biases.