Why it’s so hard to create unbiased artificial intelligence

Ben Dickson Contributor

Ben Dickson is a software engineer and the founder of TechTalks.

The inherent problem with machine learning

At its core, machine learning uses algorithms to parse data, extract patterns, learn and make predictions and decisions based on the gleaned insights. This is the mechanics behind many of the technologies we’re seeing every day, such as search engines, face recognition apps and digital assistants. The more data you feed into a machine learning algorithm, the smarter it gets; that’s why every tech firm is looking for ways to gather more data about its customers and users.

But at the end of the day, machine learning can only be as smart as the data that’s being fed to it, and therein lies the problem — because the same data used to train machine learning algorithms can teach it to become evil or biased.

And like every child, machine learning algorithms tend to pick up the tastes and biases of the persons who rear them. What makes things even more complex is the fact that firms are very protective and discreet of the inner workings of their algorithms and treat them as trade secrets.

How can machine learning go wrong?

Beauty.ai, a machine learning startup, held the world’s first AI-driven beauty contest this year. More than 6,000 people submitted their pictures to have their attractiveness evaluated based on factors such as symmetry and wrinkles.

This was supposed to relieve us of the social biases that human judges have. But the results turned out to be somewhat disappointing: Of the 44 winners, the majority were white, a few were Asian and only one was dark-skinned. The problem, as one researcher explained to Motherboard, was that the image samples used to train the algorithms weren’t balanced in terms of race and ethnicity.

This is not the first time that machine learning’s white guy problem became an issue. Earlier this year, a language processing algorithm was found to judge white-sounding names such as Emily and Matt to be more pleasant than black-sounding names such as Jamal and Ebony.

Erasing bias from databases is the key to creating impartial machine learning algorithms.

In another instance, Microsoft was forced to shut down Tay, the chatbot that was designed to mimic the behavior of a teenage girl, after it started spewing offensive tweets. Tay was supposed to ingest comments from users, process them and learn to respond in personalized ways. But it seems that users were more interested in teaching racism and Nazism to Tay.

But what happens in more sensitive situations, such as when a person’s life or freedom is at stake? ProPublica research in May found that an algorithm used by law enforcement in Florida gave a higher score to black people when assessing the probability of convicts committing crimes in the future.

The list of machine learning fiascos are nearly endless, including a Google algorithm that marked black people as gorillas, an ad delivery engine that is less likely to show high-paying job propositions to women and a news algorithm that promotes fake and sometimes vulgar stories.

Who is responsible for the wrongdoings of machine learning?

Where traditional software is involved, deciding whether an error was due to a user mistake or a design flaw in the software is pretty straightforward.

But machine learning is not so transparent, and one of its biggest challenges is determining responsibilities. Developing machine learning software is dramatically different from traditional coding, and is as much about training the algorithms as it is about writing the software. Even the creators cannot predict with exact precision how the machine will decide, and sometimes become amazed with the results they yield.

Therefore, things got a bit murky when Facebook was accused of being politically biased about its “Trending Topics” module, a component that is at least partly powered by machine learning. And when Republican presidential nominee Donald Trump accused Google of tweaking its search engine’s results to suppress bad news about Hillary Clinton, transparently debunking the claim and explaining the mechanics proved a bit complicated.

Things get more sensitive when critical decisions are conferred to artificial intelligence. For instance, if a self-driving vehicle decides to run down a pedestrian, who will be held accountable? The driver — or more precisely, the owner — of the car or the developer of the machine learning algorithm?

How do you remove bias from machine learning algorithms?

Erasing bias from databases is the key to creating impartial machine learning algorithms. But creating a balanced database is by itself a complicated feat. There is currently no type of regulation or standard governing the data that is used to train machine learning algorithms, and researchers sometimes use and share off-the-shelf frameworks and databases that already have bias ingrained into them.

One solution would be to create shared and regulated databases that are in possession of no single entity, thus preventing any party from unilaterally manipulating the data to their own favor.

A notable effort in this regard is the foundation of the Partnership on Artificial Intelligence, a historic partnership between Facebook, Amazon, Google, IBM and Microsoft, some of the leading voices in machine learning innovations, to address some of the fears stemming from the growth of artificial intelligence and machine learning. Part of the goals of the Partnership on AI include addressing the ethical issues of artificial intelligence, and to make sure that a more diverse set of eyes are looking at AI before it reaches the public.

Another interesting initiative is Elon Musk’s OpenAI, the artificial intelligence company aimed at making AI more transparent and to expand access to it and prevent it from becoming a tool to do evil things.

There will probably come a day where robots become intelligent enough to explain their own behavior and correct their mistakes. But we still have a long way to go. Until then, the onus is on us humans to prevent them from amplifying the negative tendencies of the humans who train them. This is something that can be achieved when our efforts are combined, not siloed.