Featured Article

Why is AI so bad at spelling? Because image generators aren’t actually reading text

AI is seemingly unstoppable, but it can’t spell ‘burrito’

11:47 AM PDT • March 21, 2024

Firefly photograph of a street sign on a busy road near a billboard that says hello — **Image Credits:** Adobe Firefly

AIs are easily acing the SAT, defeating chess grandmasters and debugging code like it’s nothing. But put an AI up against some middle schoolers at the spelling bee, and it’ll get knocked out faster than you can say diffusion.

For all the advancements we’ve seen in AI, it still can’t spell. If you ask text-to-image generators like DALL-E to create a menu for a Mexican restaurant, you might spot some appetizing items like “taao,” “burto” and “enchida” amid a sea of other gibberish.

And while ChatGPT might be able to write your papers for you, it’s comically incompetent when you prompt it to come up with a 10-letter word without the letters “A” or “E” (it told me, “balaclava”). Meanwhile, when a friend tried to use Instagram’s AI to generate a sticker that said “new post,” it created a graphic that appeared to say something that we are not allowed to repeat on TechCrunch, a family website.

**Image Credits:** Microsoft Designer (DALL-E 3)

“Image generators tend to perform much better on artifacts like cars and people’s faces, and less so on smaller things like fingers and handwriting,” said Asmelash Teka Hadgu, co-founder of Lesan and a fellow at the DAIR Institute.

The underlying technology behind image and text generators are different, yet both kinds of models have similar struggles with details like spelling. Image generators generally use diffusion models, which reconstruct an image from noise. When it comes to text generators, large language models (LLMs) might seem like they’re reading and responding to your prompts like a human brain — but they’re actually using complex math to match the prompt’s pattern with one in its latent space, letting it continue the pattern with an answer.

“The diffusion models, the latest kind of algorithms used for image generation, are reconstructing a given input,” Hagdu told TechCrunch. “We can assume writings on an image are a very, very tiny part, so the image generator learns the patterns that cover more of these pixels.”

The algorithms are incentivized to recreate something that looks like what it’s seen in its training data, but it doesn’t natively know the rules that we take for granted — that “hello” is not spelled “heeelllooo,” and that human hands usually have five fingers.

“Even just last year, all these models were really bad at fingers, and that’s exactly the same problem as text,” said Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta. “They’re getting really good at it locally, so if you look at a hand with six or seven fingers on it, you could say, ‘Oh wow, that looks like a finger.’ Similarly, with the generated text, you could say, that looks like an ‘H,’ and that looks like a ‘P,’ but they’re really bad at structuring these whole things together.”

Engineers can ameliorate these issues by augmenting their data sets with training models specifically designed to teach the AI what hands should look like. But experts don’t foresee these spelling issues resolving as quickly.

“You can imagine doing something similar — if we just create a whole bunch of text, they can train a model to try to recognize what is good versus bad, and that might improve things a little bit. But unfortunately, the English language is really complicated,” Guzdial told TechCrunch. And the issue becomes even more complex when you consider how many different languages the AI has to learn to work with.

Some models, like Adobe Firefly, are taught to just not generate text at all. If you input something simple like “menu at a restaurant,” or “billboard with an advertisement,” you’ll get an image of a blank paper on a dinner table, or a white billboard on the highway. But if you put enough detail in your prompt, these guardrails are easy to bypass.

“You can think about it almost like they’re playing Whac-A-Mole, like, ‘Okay a lot of people are complaining about our hands — we’ll add a new thing just addressing hands to the next model,’ and so on and so forth,” Guzdial said. “But text is a lot harder. Because of this, even ChatGPT can’t really spell.”

On Reddit, YouTube and X, a few people have uploaded videos showing how ChatGPT fails at spelling in ASCII art, an early internet art form that uses text characters to create images. In one recent video, which was called a “prompt engineering hero’s journey,” someone painstakingly tries to guide ChatGPT through creating ASCII art that says “Honda.” They succeed in the end, but not without Odyssean trials and tribulations.

oh. my. GOD.
byu/debiEszter inChatGPT

“One hypothesis I have there is that they didn’t have a lot of ASCII art in their training,” said Hagdu. “That’s the simplest explanation.”

But at the core, LLMs just don’t understand what letters are, even if they can write sonnets in seconds.

“LLMs are based on this transformer architecture, which notably is not actually reading text. What happens when you input a prompt is that it’s translated into an encoding,” Guzdial said. “When it sees the word “the,” it has this one encoding of what “the” means, but it does not know about ‘T,’ ‘H,’ ‘E.’”

That’s why when you ask ChatGPT to produce a list of eight-letter words without an “O” or an “S,” it’s incorrect about half of the time. It doesn’t actually know what an “O” or “S” is (although it could probably quote you the Wikipedia history of the letter).

Though these DALL-E images of bad restaurant menus are funny, the AI’s shortcomings are useful when it comes to identifying misinformation. When we’re trying to see if a dubious image is real or AI-generated, we can learn a lot by looking at street signs, t-shirts with text, book pages or anything where a string of random letters might betray an image’s synthetic origins. And before these models got better at making hands, a sixth (or seventh, or eighth) finger could also be a giveaway.

But, Guzdial says, if we look close enough, it’s not just fingers and spelling that AI gets wrong.

“These models are making these small, local issues all of the time — it’s just that we’re particularly well-tuned to recognize some of them,” he said.

To an average person, for example, an AI-generated image of a music store could be easily believable. But someone who knows a bit about music might see the same image and notice that some of the guitars have seven strings, or that the black and white keys on a piano are spaced out incorrectly.

Though these AI models are improving at an alarming rate, these tools are still bound to encounter issues like this, which limits the capacity of the technology.

“This is concrete progress, there’s no doubt about it,” Hagdu said. “But the kind of hype that this technology is getting is just insane.”

This Week in AI: Midjourney bets it can beat the copyright police

More TechCrunch

Temu accused of breaching EU’s DSA in bundle of consumer complaints

Natasha Lomas

2 hours ago

Consumer protection groups around the European Union have filed coordinated complaints against Temu, accusing the Chinese-owned ultra low-cost e-commerce platform of a raft of breaches related to the bloc’s Digital…

Temu accused of breaching EU’s DSA in bundle of consumer complaints

Hardware

Google I/O 2024: Here’s everything Google just announced

Christine Hall

8 hours ago

Here are quick hits of the biggest news from the keynote as they are announced.

Google I/O 2024: Here’s everything Google just announced

Government & Policy

Senate study proposes ‘at least’ $32B yearly for AI programs

Devin Coldewey

10 hours ago

The AI industry moves faster than the rest of the technology sector, which means it outpaces the federal government by several orders of magnitude.

Senate study proposes ‘at least’ $32B yearly for AI programs

Security

FBI seizes hacking forum BreachForums — again

Lorenzo Franceschi-Bicchierai

10 hours ago

The FBI along with a coalition of international law enforcement agencies seized the notorious cybercrime forum BreachForums on Wednesday. For years, BreachForums has been a popular English-language forum for hackers…

FBI seizes hacking forum BreachForums — again

Media & Entertainment

Netflix to take on Google and Amazon by building its own ad server

Lauren Forristal

11 hours ago

The announcement signifies a significant shake-up in the streaming giant’s advertising approach.

Netflix to take on Google and Amazon by building its own ad server

Enterprise

Matt Garman taking over as CEO with AWS at crossroads

Ron Miller

11 hours ago

It’s tough to say that a $100 billion business finds itself at a critical juncture, but that’s the case with Amazon Web Services, the cloud arm of Amazon, and the…

Matt Garman taking over as CEO with AWS at crossroads

Google still hasn’t fixed Gemini’s biased image generator

Kyle Wiggers

11 hours ago

Back in February, Google paused its AI-powered chatbot Gemini’s ability to generate images of people after users complained of historical inaccuracies. Told to depict “a Roman legion,” for example, Gemini would show…

Google still hasn’t fixed Gemini’s biased image generator

Privacy

Google’s call-scanning AI could dial up censorship by default, privacy experts warn

Natasha Lomas

13 hours ago

A feature Google demoed at its I/O confab yesterday, using its generative AI technology to scan voice calls in real time for conversational patterns associated with financial scams, has sent…

Google’s call-scanning AI could dial up censorship by default, privacy experts warn

The top AI announcements from Google I/O

Kyle Wiggers

13 hours ago

Google’s going all in on AI — and it wants you to know it. During the company’s keynote at its I/O developer conference on Tuesday, Google mentioned “AI” more than…

The top AI announcements from Google I/O

Transportation

Uber has a new way to solve the concert traffic problem

Rebecca Bellan

13 hours ago

Uber is taking a shuttle product it developed for commuters in India and Egypt and converting it for an American audience. The ride-hail and delivery giant announced Wednesday at its…

Uber has a new way to solve the concert traffic problem

Google takes aim at Android malware with an AI-powered live threat detection service

Sarah Perez

13 hours ago

Google is preparing to launch a new system to help address the problem of malware on Android. Its new live threat detection service leverages Google Play Protect’s on-device AI to…

Apps

Google Maps is getting geospatial AR content later this year

Aisha Malik

13 hours ago

Users will be able to access the AR content by first searching for a location in Google Maps.

Google Maps is getting geospatial AR content later this year

Climate

Quilt heat pump sports sleek design from veterans of Apple, Tesla and Nest

Tim De Chant

13 hours ago

The heat pump startup unveiled its first products and revealed details about performance, pricing and availability.

Quilt heat pump sports sleek design from veterans of Apple, Tesla and Nest

Apps

Google’s new Private Space feature is like Incognito Mode for Android

Brian Heater

13 hours ago

The space is available from the launcher and can be locked as a second layer of authentication.

Google’s new Private Space feature is like Incognito Mode for Android

Media & Entertainment

Google TV to launch AI-generated movie descriptions

Lauren Forristal

13 hours ago

Gemini, the company’s family of generative AI models, will enhance the smart TV operating system so it can generate descriptions for movies and TV shows.

Google TV to launch AI-generated movie descriptions

Hardware

Android’s new Theft Detection Lock helps deter smartphone snatch and grabs

Brian Heater

13 hours ago

When triggered, the AI-powered feature will automatically lock the device down.

Android’s new Theft Detection Lock helps deter smartphone snatch and grabs

Security

Google adds live threat detection and screen-sharing protection to Android

Ivan Mehta

13 hours ago

The company said it is increasing the on-device capability of its Google Play Protect system to detect fraudulent apps trying to breach sensitive permissions.

Google adds live threat detection and screen-sharing protection to Android

Apps

Wear OS 5 hits developer preview, offering better battery life

Sarah Perez

13 hours ago

This latest release, one of many announcements from the Google I/O 2024 developer conference, focuses on improved battery life and other performance improvements, like more efficient workout tracking.

Wear OS 5 hits developer preview, offering better battery life

Startups

Dietitian startup Fay has been booming from Ozempic patients and emerges from stealth with $25M from General Catalyst, Forerunner

Marina Temkin

14 hours ago

For years, Sammy Faycurry has been hearing from his registered dietitian (RD) mom and sister about how poorly many Americans eat and their struggles with delivering nutritional counseling. Although nearly…

Dietitian startup Fay has been booming from Ozempic patients and emerges from stealth with $25M from General Catalyst, Forerunner

Hardware

Apple announces new accessibility features for iPhone and iPad users

Lauren Forristal

14 hours ago

Apple is bringing new accessibility features to iPads and iPhones, designed to cater to a diverse range of user needs.

Apple announces new accessibility features for iPhone and iPad users

Startups

Startup Blueprint: TC Disrupt 2024 Builders Stage agenda sneak peek!

TechCrunch Events

15 hours ago

TechCrunch Disrupt, our flagship startup event held annually in San Francisco, is back on October 28-30 — and you can expect a bustling crowd of thousands of startup enthusiasts. Exciting…

Startup Blueprint: TC Disrupt 2024 Builders Stage agenda sneak peek!

Anthropic hires Instagram co-founder as head of product

Kyle Wiggers

16 hours ago

Mike Krieger, one of the co-founders of Instagram and, more recently, the co-founder of personalized news app Artifact (which TechCrunch corporate parent Yahoo recently acquired), is joining Anthropic as the…

Anthropic hires Instagram co-founder as head of product

Venture

Venture orgs form alliance to standardize data collection

Dominic-Madori Davis

16 hours ago

Seven orgs so far have signed on to standardize the way data is collected and shared.

Venture orgs form alliance to standardize data collection

Enterprise

Alkira connects with $100M for a solution that connects your clouds

Ingrid Lunden

16 hours ago

As cloud adoption continues to surge toward the $1 trillion mark in annual spend, we’re seeing a wave of enterprise startups gaining traction with customers and investors for tools to…

Alkira connects with $100M for a solution that connects your clouds

Climate

Orange Charger thinks a $750 outlet will solve EV charging for apartment dwellers

Tim De Chant

17 hours ago

Charging has long been the Achilles’ heel of electric vehicles. One startup thinks it has a better way for apartment dwelling EV drivers to charge overnight.

Orange Charger thinks a $750 outlet will solve EV charging for apartment dwellers

Fundraising

Embedded accounting startup Layer secures $2.3M toward goal of replacing QuickBooks

Christine Hall

17 hours ago

So did investors laugh them out of the room when they explained how they wanted to replace Quickbooks? Kind of.

Embedded accounting startup Layer secures $2.3M toward goal of replacing QuickBooks

Weka raises $140M as the AI boom bolsters data platforms

Kyle Wiggers

17 hours ago

While an increasing number of companies are investing in AI, many are struggling to get AI-powered projects into production — much less delivering meaningful ROI. The challenges are many. But…

Weka raises $140M as the AI boom bolsters data platforms

Startups

Meet PayHOA, a profitable and once-bootstrapped SaaS startup that just landed a $27.5M Series A

Mary Ann Azevedo

19 hours ago

PayHOA, a previously bootstrapped Kentucky-based startup that offers software for self-managed homeowner associations (HOAs), is an example of how real-world problems can translate into opportunity. It just raised a $27.5…