Apps

While anticipation builds for GPT-4, OpenAI quietly releases GPT-3.5

Comment

OpenAI's logo
Image Credits: OpenAI

Released two years ago, OpenAI’s remarkably capable, if flawed, GPT-3 was perhaps the first to demonstrate that AI can write convincingly — if not perfectly — like a human. The successor to GPT-3, most likely called GPT-4, is expected to be unveiled in the near future, perhaps as soon as 2023. But in the meantime, OpenAI has quietly rolled out a series of AI models based on “GPT-3.5,” a previously-unannounced, improved version of GPT-3.

GPT-3.5 broke cover on Wednesday with ChatGPT, a fine-tuned version of GPT-3.5 that’s essentially a general-purpose chatbot. Debuted in a public demo yesterday afternoon, ChatGPT can engage with a range of topics, including programming, TV scripts and scientific concepts.

According to OpenAI, GPT-3.5 was trained on a blend of text and code published prior to Q4 2021. Like GPT-3 and other text-generating AI, GPT-3.5 learned the relationships between sentences, words and parts of words by ingesting huge amounts of content from the web, including hundreds of thousands of Wikipedia entries, social media posts and news articles.

Rather than release the fully trained GPT-3.5, OpenAI used it to create several systems fine-tuned for specific tasks — each available through the OpenAI API. One — text-davinci-003 — can handle more complex instructions than models built on GPT-3, according to the lab, and is measurably better at both long-form and “high-quality” writing.

According to OpenAI data scientist Jan Leike, text-davinci-003 is similar but not identical to InstructGPT, a family of GPT-3-based models released by OpenAI earlier this year that are less likely to generate problematic (e.g., toxic and highly biased) text while more closely aligning with a user’s intent. Text-davinci-003 — and by extension GPT-3.5 — “scores higher on human preference ratings” while suffering from “less severe” limitations, Leike said in a tweet.

That appears to be the case anecdotally. Data scientists at Pepper Content, a content marketing platform, report that text-davinci-003 “performs better in understanding the ‘context’ behind a request and then using that to produce better content” while “hallucinating” less than GPT-3-based models. (Where it concerns text-generating AI, hallucination refers to an AI writing inconsistent, factually incorrect statements.)

In a test on OpenAI’s Playground website, which provides a UI frontend for the models, the Pepper Content team fed several prompts to text-davinci-003 and a model based on GPT-3 (text-davinci-002). Given “What is the philosophy behind WeWork?,” the GPT-3.5-based text-davinci-003 generated this:

WeWork is a global workspace provider that believes people are the most important asset in any organization. The philosophy of WeWork is to create a collaborative environment that enables people to work together in a flexible and efficient way. The company encourages collaboration and productivity, while providing a comfortable and inspiring space.

WeWork is also committed to being a socially responsible organization, by finding ways to reduce its environmental impact, by providing meaningful work experiences, and by promoting diversity and inclusion. WeWork also strives to create meaningful experiences for its members, through its unique community-based programming, events and activities. The company believes that when people work together in an inspiring and collaborative environment, they can achieve more and create meaningful change.

It’s not perfect — note the excess commas and repetitiveness. But the copy’s certainly more engaging than what the GPT-3-based text-davinci-002 produced:

At WeWork, we believe in openness, collaboration, and community. We have reimagined what a workspace can be by bringing together a global community of creators, entrepreneurs, and startups — anyone looking to build something meaningful and transform the world. We are proud to be your neighbour, coworker, and friend.

GPT-3.5 is also better at generating blog posts, it seems. Here’s what the Pepper Content team got when they prompted text-davinci-003 to write a post about picking a sofa:

If you are looking to give your green living room a fresh look, adding the the best sofa for the space is a great way to start. However, there is more to choosing the perfect sofa than just finding a piece that looks good. You want to find a piece that is comfortable and fits your green living room décor.

In this blog, we will discuss how to pick sofa far your green living room. We will look at the different types of sofas available, the design factors to consider when making your selection, and how to choose a sofa that fits your style and budget. We will also provide tips on how to maintain and care for your new sofa. By the end of this blog, you will have all the information you need to pick the perfect for your green living room.

Again, it isn’t foolproof. GPT-3.5 oddly added the bit about a “green living room.” But also again, GPT-3 is more basic and less grammatically correct in its generation:

Sofa is one of the most basic requirements in a living room. It’s not just a piece of furniture but an important part of the décor of your living room. So, what should be the criteria while picking a sofa? If you are wondering about this then stay with me as I discuss the different aspects of the sofa would help you in picking the best one for yourself.

Experiments beyond Pepper Content’s suggest that GPT-3.5 tends to be much more sophisticated and thorough in its responses than GPT-3. For example, when YouTube channel All About AI prompted text-davinci-003 to write a history about AI, the model’s output mentioned key luminaries in the field, including Alan Turing and Arthur Samuelson, while text-davinci-002”s did not. All About AI also found that text-davinci-003 tended to have a more nuanced understanding of instructions, for instance providing details such as a title, description, outline, introduction and recap when asked to create a video script.

That’s no accident — a hallmark feature of text-davinci-003/GPT-3.5’s outputs is verboseness. (This writer can sympathize.) In an analysis, scientists at startup Scale AI found text-davinci-003/GPT-3.5 generates outputs roughly 65% longer than text-davinci-002/GPT-3 with identical prompts.

Perhaps less useful for most potential users but nonetheless entertaining, text-davinci-003/GPT-3.5 is superior at composing songs, limericks and rhyming poetry than its predecessor. Ars Technica reports that commenters on Y Combinator’s Hacker News forum used text-davinci-003 to write a poem explaining Albert Einstein’s theory of relativity and then re-write the poem in the style of John Keats. See:

If you want to understand Einstein’s thought
It’s not that hard if you give it a shot
General Relativity is the name of the game
Where space and time cannot remain the same
Mass affects the curvature of space
Which affects the flow of time’s race
An object’s motion will be affected
By the distortion that is detected
The closer you are to a large mass
The slower time will seem to pass
The farther away you may be
Time will speed up for you to see

The Scale AI team even found that text-davinci-003/GPT-3.5 has a notion of meters like iambic pentameter. See:

O gentle steeds, that bear me swift and

sure

Through fields of green and pathways so

obscure,

My heart doth swell with pride to be with

you

As on we ride the world a-fresh to view

The wind doth whistle through our hair so

free

And stirs a passion deep inside of me.

My soul doth lift, my spirits soar on high,

To ride with you, my truest friend, am I

Your strength and grace, your courage and

your fire,

Inspire us both to go beyond our sire.

No earthly bonds can hold us, only fate,

To gallop on, our wond’rous course create

Relatedly, GPT-3.5 is wittier than GPT-3 — at least from a subjective standpoint. Asking text-davinci-002/GPT-3 to “tell a joke” usually yields this:

Why did the chicken cross the road? To get to the other side.

Text-davinci-003/GPT-3.5 has cleverer responses:

Q: What did the fish say when it hit the wall? A: Dam!

Q: What did one ocean say to the other ocean? A: Nothing, they just waved.

Scale AI had the model explain Python code in the style of Eminem, a feat which text-davinci-002/GPT-3 simply couldn’t accomplish:

Yo, so I’m loopin’ through this list

With each item that I find

I’m gonna print out every letter in each one

of them

Dog, Cat, Banana, Apple, I’m gonna get’em

all with this rhyme

So why is GPT-3.5 better than GPT-3 in these particular areas? We can’t know the exact answer without additional details from OpenAI, which aren’t forthcoming; an OpenAI spokesperson declined a request for comment. But it’s safe to assume that GPT-3.5’s training approach had something to do with it. Like InstructGPT, GPT-3.5 was trained with the help of human trainers who ranked and rated the way early versions of the model responded to prompts. This information was then fed back into the system, which tuned its answers to match the trainers’ preferences.

Of course, this doesn’t make GPT-3.5 immune to the pitfalls to which all modern language models succumb. Because GPT-3.5 merely relies on statistical regularities in its training data rather than a human-like understanding of the world, it’s still prone to, in Leike’s words, “mak[ing] stuff up a bunch.” It also has limited knowledge of the world after 2021 because its training data is more sparse after that year. And the model’s safeguards against toxic output can be circumvented.

Still, GPT-3.5 and its derivative models demonstrate that GPT-4 — whenever it arrives — won’t necessarily need a huge number of parameters to best the most capable text-generating systems today. (Parameters are the parts of the model learned from historical training data and essentially define the skill of the model on a problem.) While some have predicted that GPT-4 will contain over 100 trillion parameters — nearly 600 times as many as GPT-3 — others argue that emerging techniques in language processing, like those seen in GPT-3.5 and InstructGPT, will make such a jump unnecessary.

One of those techniques could involve browsing the web for greater context, a la Meta’s ill-fated BlenderBot 3.0 chatbot. John Shulman, a research scientist and co-founder of OpenAI, told MIT Tech Review in a recent interview that OpenAI is continuing work on a language model it announced late last year, WebGPT, that can go and look up information on the web (via Bing) and give sources for its answers. At least one Twitter user appears to have found evidence of the feature undergoing testing for ChatGPT.

OpenAI has another reason to pursue lower-parameter models as it continues to evolve GPT-3: huge costs. A 2020 study from AI21 Labs pegged the expenses for developing a text-generating model with only 1.5 billion parameters at as much as $1.6 million. OpenAI has raised over $1 billion to date from Microsoft and other backers, and it’s reportedly in talks to raise more. But all investors, no matter how big, expect to see returns eventually.

More TechCrunch

The Series C funding, which brings its total raise to around $95 million, will go toward mass production of the startup’s inaugural products

AI chip startup DEEPX secures $80M Series C at a $529M valuation 

A dust-up between Evolve Bank & Trust, Mercury and Synapse has led TabaPay to abandon its acquisition plans of troubled banking-as-a-service startup Synapse.

Infighting among fintech players has caused TabaPay to ‘pull out’ from buying bankrupt Synapse

The problem is not the media, but the message.

Apple’s ‘Crush’ ad is disgusting

The Twitter for Android client was “a demo app that Google had created and gave to us,” says Particle co-founder and ex-Twitter employee Sara Beykpour.

Google built some of the first social apps for Android, including Twitter and others

WhatsApp is updating its mobile apps for a fresh and more streamlined look, while also introducing a new “darker dark mode,” the company announced on Thursday. The messaging app says…

WhatsApp’s latest update streamlines navigation and adds a ‘darker dark mode’

Plinky lets you solve the problem of saving and organizing links from anywhere with a focus on simplicity and customization.

Plinky is an app for you to collect and organize links easily

The keynote kicks off at 10 a.m. PT on Tuesday and will offer glimpses into the latest versions of Android, Wear OS and Android TV.

Google I/O 2024: How to watch

For cancer patients, medicines administered in clinical trials can help save or extend lives. But despite thousands of trials in the United States each year, only 3% to 5% of…

Triomics raises $15M Series A to automate cancer clinical trials matching

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Tap, tap.…

Tesla drives Luminar lidar sales and Motional pauses robotaxi plans

The newly announced “Public Content Policy” will now join Reddit’s existing privacy policy and content policy to guide how Reddit’s data is being accessed and used by commercial entities and…

Reddit locks down its public data in new content policy, says use now requires a contract

Eva Ho plans to step away from her position as general partner at Fika Ventures, the Los Angeles-based seed firm she co-founded in 2016. Fika told LPs of Ho’s intention…

Fika Ventures co-founder Eva Ho will step back from the firm after its current fund is deployed

In a post on Werner Vogels’ personal blog, he details Distill, an open-source app he built to transcribe and summarize conference calls.

Amazon’s CTO built a meeting-summarizing app for some reason

Paris-based Mistral AI, a startup working on open source large language models — the building block for generative AI services — has been raising money at a $6 billion valuation,…

Sources: Mistral AI raising at a $6B valuation, SoftBank ‘not in’ but DST is

You can expect plenty of AI, but probably not a lot of hardware.

Google I/O 2024: What to expect

Dating apps and other social friend-finders are being put on notice: Dating app giant Bumble is looking to make more acquisitions.

Bumble says it’s looking to M&A to drive growth

When Class founder Michael Chasen was in college, he and a buddy came up with the idea for Blackboard, an online classroom organizational tool. His original company was acquired for…

Blackboard founder transforms Zoom add-on designed for teachers into business tool

Groww, an Indian investment app, has become one of the first startups from the country to shift its domicile back home.

Groww joins the first wave of Indian startups moving domiciles back home from US

Technology giant Dell notified customers on Thursday that it experienced a data breach involving customers’ names and physical addresses. In an email seen by TechCrunch and shared by several people…

Dell discloses data breach of customers’ physical addresses

Featured Article

Fairgen ‘boosts’ survey results using synthetic data and AI-generated responses

The Israeli startup has raised $5.5M for its platform that uses “statistical AI” to generate synthetic data that it says is as good as the real thing.

19 hours ago
Fairgen ‘boosts’ survey results using synthetic data and AI-generated responses

Hydrow, the at-home rowing machine maker, announced Thursday that it has acquired a majority stake in Speede Fitness, the company behind the AI-enabled strength training machine. The rowing startup also…

Rowing startup Hydrow acquires a majority stake in Speede Fitness as their CEO steps down

Call centers are embracing automation. There’s debate as to whether that’s a good thing, but it’s happening — and quite possibly accelerating. According to research firm TechSci Research, the global…

Retell AI lets companies build ‘voice agents’ to answer phone calls

TikTok is starting to automatically label AI-generated content that was made on other platforms, the company announced on Thursday. With this change, if a creator posts content on TikTok that…

TikTok will automatically label AI-generated content created on platforms like DALL·E 3

India’s mobile payments regulator is likely to extend the deadline for imposing market share caps on the popular UPI (unified payments interface) payments rail by one to two years, sources…

India likely to delay UPI market caps in win for PhonePe-Google Pay duopoly

Line Man Wongnai, an on-demand food delivery service in Thailand, is considering an initial public offering on a Thai exchange or the U.S. in 2025.

Thai food delivery app Line Man Wongnai weighs IPO in Thailand, US in 2025

Ever wonder why conversational AI like ChatGPT says “Sorry, I can’t do that” or some other polite refusal? OpenAI is offering a limited look at the reasoning behind its own…

OpenAI offers a peek behind the curtain of its AI’s secret instructions

The federal government agency responsible for granting patents and trademarks is alerting thousands of filers whose private addresses were exposed following a second data spill in as many years. The…

US Patent and Trademark Office confirms another leak of filers’ address data

As part of an investigation into people involved in the pro-independence movement in Catalonia, the Spanish police obtained information from the encrypted services Wire and Proton, which helped the authorities…

Encrypted services Apple, Proton and Wire helped Spanish police identify activist

Match Group, the company that owns several dating apps, including Tinder and Hinge, released its first-quarter earnings report on Tuesday, which shows that Tinder’s paying user base has decreased for…

Match looks to Hinge as Tinder fails

Private social networking is making a comeback. Gratitude Plus, a startup that aims to shift social media in a more positive direction, is expanding its wellness-focused, personal reflections journal to…

Gratitude Plus makes social networking positive, private and personal

With venture totals slipping year-over-year in key markets like the United States, and concern that venture firms themselves are struggling to raise more capital, founders might be worried. After all,…

Can AI help founders fundraise more quickly and easily?