AI

How the law got it wrong with Apple Card

Comment

BRAZIL - 2019/10/13: In this illustration the homepage of the Apple Card website is seen displayed on the computer screen through a magnifying glass. (Photo Illustration by Rafael Henrique/SOPA Images/LightRocket via Getty Images)
Image Credits: SOPA Images (opens in a new window) / Getty Images

Liz O'Sullivan

Contributor

Liz O’Sullivan is CEO of Parity, a platform that automates model risk and algorithmic governance for the enterprise. She also advises the Surveillance Technology Oversight Project and the Campaign to Stop Killer Robots on all things artificial intelligence.

More posts from Liz O'Sullivan

Advocates of algorithmic justice have begun to see their proverbial “days in court” with legal investigations of enterprises like UHG and Apple Card. The Apple Card case is a strong example of how current anti-discrimination laws fall short of the fast pace of scientific research in the emerging field of quantifiable fairness.

While it may be true that Apple and their underwriters were found innocent of fair lending violations, the ruling came with clear caveats that should be a warning sign to enterprises using machine learning within any regulated space. Unless executives begin to take algorithmic fairness more seriously, their days ahead will be full of legal challenges and reputational damage.

What happened with Apple Card?

In late 2019, startup leader and social media celebrity David Heinemeier Hansson raised an important issue on Twitter, to much fanfare and applause. With almost 50,000 likes and retweets, he asked Apple and their underwriting partner, Goldman Sachs, to explain why he and his wife, who share the same financial ability, would be granted different credit limits. To many in the field of algorithmic fairness, it was a watershed moment to see the issues we advocate go mainstream, culminating in an inquiry from the NY Department of Financial Services (DFS).

At first glance, it may seem heartening to credit underwriters that the DFS concluded in March that Goldman’s underwriting algorithm did not violate the strict rules of financial access created in 1974 to protect women and minorities from lending discrimination. While disappointing to activists, this result was not surprising to those of us working closely with data teams in finance.

There are some algorithmic applications for financial institutions where the risks of experimentation far outweigh any benefit, and credit underwriting is one of them. We could have predicted that Goldman would be found innocent, because the laws for fairness in lending (if outdated) are clear and strictly enforced.

And yet, there is no doubt in my mind that the Goldman/Apple algorithm discriminates, along with every other credit scoring and underwriting algorithm on the market today. Nor do I doubt that these algorithms would fall apart if researchers were ever granted access to the models and data we would need to validate this claim. I know this because the NY DFS partially released its methodology for vetting the Goldman algorithm, and as you might expect, their audit fell far short of the standards held by modern algorithm auditors today.

How did DFS (under current law) assess the fairness of Apple Card?

In order to prove the Apple algorithm was “fair,” DFS considered first whether Goldman had used “prohibited characteristics” of potential applicants like gender or marital status. This one was easy for Goldman to pass — they don’t include race, gender or marital status as an input to the model. However, we’ve known for years now that some model features can act as “proxies” for protected classes.

The DFS methodology, based on 50 years of legal precedent, failed to mention whether they considered this question, but we can guess that they did not. Because if they had, they’d have quickly found that credit score is so tightly correlated to race that some states are considering banning its use for casualty insurance. Proxy features have only stepped into the research spotlight recently, giving us our first example of how science has outpaced regulation.

In the absence of protected features, DFS then looked for credit profiles that were similar in content but belonged to people of different protected classes. In a certain imprecise sense, they sought to find out what would happen to the credit decision were we to “flip” the gender on the application. Would a female version of the male applicant receive the same treatment?

Intuitively, this seems like one way to define “fair.” And it is — in the field of machine learning fairness, there is a concept called a “flip test” and it is one of many measures of a concept called “individual fairness,” which is exactly what it sounds like. I asked Patrick Hall, principal scientist at bnh.ai, a leading boutique AI law firm, about the analysis most common in investigating fair lending cases. Referring to the methods DFS used to audit Apple Card, he called it basic regression, or “a 1970s version of the flip test,” bringing us example number two of our insufficient laws.

A new vocabulary for algorithmic fairness

Ever since Solon Barocas’ seminal paper “Big Data’s Disparate Impact” in 2016, researchers have been hard at work to define core philosophical concepts into mathematical terms. Several conferences have sprung into existence, with new fairness tracks emerging at the most notable AI events. The field is in a period of hypergrowth, where the law has as of yet failed to keep pace. But just like what happened to the cybersecurity industry, this legal reprieve won’t last forever.

Perhaps we can forgive DFS for its softball audit given that the laws governing fair lending are born of the civil rights movement and have not evolved much in the 50-plus years since inception. The legal precedents were set long before machine learning fairness research really took off. If DFS had been appropriately equipped to deal with the challenge of evaluating the fairness of the Apple Card, they would have used the robust vocabulary for algorithmic assessment that’s blossomed over the last five years.

The DFS report, for instance, makes no mention of measuring “equalized odds,” a notorious line of inquiry first made famous in 2018 by Joy Buolamwini, Timnit Gebru and Deb Raji. Their “Gender Shades” paper proved that facial recognition algorithms guess wrong on dark female faces more often than they do on subjects with lighter skin, and this reasoning holds true for many applications of prediction beyond computer vision alone.

Equalized odds would ask of Apple’s algorithm: Just how often does it predict creditworthiness correctly? How often does it guess wrong? Are there disparities in these error rates among people of different genders, races or disability status? According to Hall, these measurements are important, but simply too new to have been fully codified into the legal system.

If it turns out that Goldman regularly underestimates female applicants in the real world, or assigns interest rates that are higher than Black applicants truly deserve, it’s easy to see how this would harm these underserved populations at national scale.

Financial services’ Catch-22

Modern auditors know that the methods dictated by legal precedent fail to catch nuances in fairness for intersectional combinations within minority categories — a problem that’s exacerbated by the complexity of machine learning models. If you’re Black, a woman and pregnant, for instance, your likelihood of obtaining credit may be lower than the average of the outcomes among each overarching protected category.

These underrepresented groups may never benefit from a holistic audit of the system without special attention paid to their uniqueness, given that the sample size of minorities is by definition a smaller number in the set. This is why modern auditors prefer “fairness through awareness” approaches that allow us to measure results with explicit knowledge of the demographics of the individuals in each group.

But there’s a Catch-22. In financial services and other highly regulated fields, auditors often can’t use “fairness through awareness,” because they may be prevented from collecting sensitive information from the start. The goal of this legal constraint was to prevent lenders from discrimination. In a cruel twist of fate, this gives cover to algorithmic discrimination, giving us our third example of legal insufficiency.

How Twilio is moving beyond a diversity numbers game toward becoming an anti-racist company

The fact that we can’t collect this information hamstrings our ability to find out how models treat underserved groups. Without it, we might never prove what we know to be true in practice — full-time moms, for instance, will reliably have thinner credit files, because they don’t execute every credit-based purchase under both spousal names. Minority groups may be far more likely to be gig workers, tipped employees or participate in cash-based industries, leading to commonalities among their income profiles that prove less common for the majority.

Importantly, these differences on the applicants’ credit files do not necessarily translate to true financial responsibility or creditworthiness. If it’s your goal to predict creditworthiness accurately, you’d want to know where the method (e.g., a credit score) breaks down.

What this means for businesses using AI

In Apple’s example, it’s worth mentioning a hopeful epilogue to the story where Apple made a consequential update to their credit policy to combat the discrimination that is protected by our antiquated laws. In Apple CEO Tim Cook’s announcement, he was quick to highlight a “lack of fairness in the way the industry [calculates] credit scores.”

Their new policy allows spouses or parents to combine credit files such that the weaker credit file can benefit from the stronger. It’s a great example of a company thinking ahead to steps that may actually reduce the discrimination that exists structurally in our world. In updating their policies, Apple got ahead of the regulation that may come as a result of this inquiry.

This is a strategic advantage for Apple, because NY DFS made exhaustive mention of the insufficiency of current laws governing this space, meaning updates to regulation may be nearer than many think. To quote Superintendent of Financial Services Linda A. Lacewell: “The use of credit scoring in its current form and laws and regulations barring discrimination in lending are in need of strengthening and modernization.” In my own experience working with regulators, this is something today’s authorities are very keen to explore.

I have no doubt that American regulators are working to improve the laws that govern AI, taking advantage of this robust vocabulary for equality in automation and math. The Federal Reserve, OCC, CFPB, FTC and Congress are all eager to address algorithmic discrimination, even if their pace is slow.

In the meantime, we have every reason to believe that algorithmic discrimination is rampant, largely because the industry has also been slow to adopt the language of academia that the last few years have brought. Little excuse remains for enterprises failing to take advantage of this new field of fairness, and to root out the predictive discrimination that is in some ways guaranteed. And the EU agrees, with draft laws that apply specifically to AI that are set to be adopted some time in the next two years.

The field of machine learning fairness has matured quickly, with new techniques discovered every year and myriad tools to help. The field is only now reaching a point where this can be prescribed with some degree of automation. Standards bodies have stepped in to provide guidance to lower the frequency and severity of these issues, even if American law is slow to adopt.

Because whether discrimination by algorithm is intentional, it is illegal. So, anyone using advanced analytics for applications relating to healthcare, housing, hiring, financial services, education or government are likely breaking these laws without knowing it.

Until clearer regulatory guidance becomes available for the myriad applications of AI in sensitive situations, the industry is on its own to figure out which definitions of fairness are best.

Embodied AI, superintelligence and the master algorithm

More TechCrunch

The deck included some redacted numbers, but there was still enough data to get a good picture.

Pitch Deck Teardown: Cloudsmith’s $15M Series A deck

The company is describing the event as “a chance to demo some ChatGPT and GPT-4 updates.”

OpenAI’s ChatGPT announcement: What we know so far

Unlike ChatGPT, Claude did not become a new App Store hit.

Anthropic’s Claude sees tepid reception on iOS compared with ChatGPT’s debut

Welcome to Startups Weekly — Haje‘s weekly recap of everything you can’t miss from the world of startups. Sign up here to get it in your inbox every Friday. Look,…

Startups Weekly: Trouble in EV land and Peloton is circling the drain

Scarcely five months after its founding, hard tech startup Layup Parts has landed a $9 million round of financing led by Founders Fund to transform composites manufacturing. Lux Capital and Haystack…

Founders Fund leads financing of composites startup Layup Parts

AI startup Anthropic is changing its policies to allow minors to use its generative AI systems — in certain circumstances, at least.  Announced in a post on the company’s official…

Anthropic now lets kids use its AI tech — within limits

Zeekr’s market hype is noteworthy and may indicate that investors see value in the high-quality, low-price offerings of Chinese automakers.

The buzziest EV IPO of the year is a Chinese automaker

Venture capital has been hit hard by souring macroeconomic conditions over the past few years and it’s not yet clear how the market downturn affected VC fund performance. But recent…

VC fund performance is down sharply — but it may have already hit its lowest point

The person who claims to have 49 million Dell customer records told TechCrunch that he brute-forced an online company portal and scraped customer data, including physical addresses, directly from Dell’s…

Threat actor says he scraped 49M Dell customer addresses before the company found out

The social network has announced an updated version of its app that lets you offer feedback about its algorithmic feed so you can better customize it.

Bluesky now lets you personalize main Discover feed using new controls

Microsoft will launch its own mobile game store in July, the company announced at the Bloomberg Technology Summit on Thursday. Xbox president Sarah Bond shared that the company plans to…

Microsoft is launching its mobile game store in July

Smart ring maker Oura is launching two new features focused on heart health, the company announced on Friday. The first claims to help users get an idea of their cardiovascular…

Oura launches two new heart health features

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: OpenAI considers allowing AI porn

Garena is quietly developing new India-themed games even though Free Fire, its biggest title, has still not made a comeback to the country.

Garena is quietly making India-themed games even as Free Fire’s relaunch remains doubtful

The U.S.’ NHTSA has opened a fourth investigation into the Fisker Ocean SUV, spurred by multiple claims of “inadvertent Automatic Emergency Braking.”

Fisker Ocean faces fourth federal safety probe

CoreWeave has formally opened an office in London that will serve as its European headquarters and home to two new data centers.

CoreWeave, a $19B AI compute provider, opens European HQ in London with plans for 2 UK data centers

The Series C funding, which brings its total raise to around $95 million, will go toward mass production of the startup’s inaugural products

AI chip startup DEEPX secures $80M Series C at a $529M valuation 

A dust-up between Evolve Bank & Trust, Mercury and Synapse has led TabaPay to abandon its acquisition plans of troubled banking-as-a-service startup Synapse.

Infighting among fintech players has caused TabaPay to ‘pull out’ from buying bankrupt Synapse

The problem is not the media, but the message.

Apple’s ‘Crush’ ad is disgusting

The Twitter for Android client was “a demo app that Google had created and gave to us,” says Particle co-founder and ex-Twitter employee Sara Beykpour.

Google built some of the first social apps for Android, including Twitter and others

WhatsApp is updating its mobile apps for a fresh and more streamlined look, while also introducing a new “darker dark mode,” the company announced on Thursday. The messaging app says…

WhatsApp’s latest update streamlines navigation and adds a ‘darker dark mode’

Plinky lets you solve the problem of saving and organizing links from anywhere with a focus on simplicity and customization.

Plinky is an app for you to collect and organize links easily

The keynote kicks off at 10 a.m. PT on Tuesday and will offer glimpses into the latest versions of Android, Wear OS and Android TV.

Google I/O 2024: How to watch

For cancer patients, medicines administered in clinical trials can help save or extend lives. But despite thousands of trials in the United States each year, only 3% to 5% of…

Triomics raises $15M Series A to automate cancer clinical trials matching

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Tap, tap.…

Tesla drives Luminar lidar sales and Motional pauses robotaxi plans

The newly announced “Public Content Policy” will now join Reddit’s existing privacy policy and content policy to guide how Reddit’s data is being accessed and used by commercial entities and…

Reddit locks down its public data in new content policy, says use now requires a contract

Eva Ho plans to step away from her position as general partner at Fika Ventures, the Los Angeles-based seed firm she co-founded in 2016. Fika told LPs of Ho’s intention…

Fika Ventures co-founder Eva Ho will step back from the firm after its current fund is deployed

In a post on Werner Vogels’ personal blog, he details Distill, an open-source app he built to transcribe and summarize conference calls.

Amazon’s CTO built a meeting-summarizing app for some reason

Paris-based Mistral AI, a startup working on open source large language models — the building block for generative AI services — has been raising money at a $6 billion valuation,…

Sources: Mistral AI raising at a $6B valuation, SoftBank ‘not in’ but DST is

You can expect plenty of AI, but probably not a lot of hardware.

Google I/O 2024: What to expect