AI

Google’s best Gemini demo was faked

Comment

Image Credits: Google

Google’s new Gemini AI model is getting a mixed reception after its big debut yesterday, but users may have less confidence in the company’s tech or integrity after finding out that the most impressive demo of Gemini was pretty much faked.

A video called “Hands-on with Gemini: Interacting with multimodal AI” hit a million views over the last day, and it’s not hard to see why. The impressive demo “highlights some of our favorite interactions with Gemini,” showing how the multimodal model (i.e., it understands and mixes language and visual understanding) can be flexible and responsive to a variety of inputs.

To begin with, it narrates an evolving sketch of a duck from a squiggle to a completed drawing, which it says is an unrealistic color, then evinces surprise (“What the quack!”) when seeing a toy blue duck. It then responds to various voice queries about that toy, then the demo moves on to other show-off moves, like tracking a ball in a cup-switching game, recognizing shadow puppet gestures, reordering sketches of planets, and so on.

It’s all very responsive, too, though the video does caution that “latency has been reduced and Gemini outputs have been shortened.” So they skip a hesitation here and an overlong answer there, got it. All in all, it was a pretty mind-blowing show of force in the domain of multimodal understanding. My own skepticism that Google could ship a contender took a hit when I watched the hands-on.

Just one problem: The video isn’t real. “We created the demo by capturing footage in order to test Gemini’s capabilities on a wide range of challenges. Then we prompted Gemini using still image frames from the footage, and prompting via text.” (Parmy Olson at Bloomberg was the first to report the discrepancy.)

So although it might kind of do the things Google shows in the video, it didn’t, and maybe couldn’t, do them live and in the way they implied. In actuality, it was a series of carefully tuned text prompts with still images, clearly selected and shortened to misrepresent what the interaction is actually like. You can see some of the actual prompts and responses in a related blog post — which, to be fair, is linked in the video description, albeit below the ” . . . more.”

On one hand, Gemini really does appear to have generated the responses shown in the video. And who wants to see some housekeeping commands like telling the model to flush its cache? But viewers are misled about the speed, accuracy, and fundamental mode of interaction with the model.

For instance, at 2:45 in the video, a hand is shown silently making a series of gestures. Gemini quickly responds, “I know what you’re doing! You’re playing Rock, Paper, Scissors!”

Image Credits: Google/YouTube

But the first thing in the documentation of the capability is how the model does not reason based on seeing individual gestures. It must be shown all three gestures at once and prompted: “What do you think I’m doing? Hint: It’s a game.” It responds, “You’re playing rock, paper, scissors.”

Image Credits: Google

Despite the similarity, these don’t feel like the same interaction. They feel like fundamentally different interactions, one an intuitive, wordless evaluation that captures an abstract idea on the fly, another an engineered and heavily hinted interaction that demonstrates limitations as much as capabilities. Gemini did the latter, not the former. The “interaction” showed in the video didn’t happen.

Later, three sticky notes with doodles of the sun, Saturn, and Earth are placed on the surface. “Is this the correct order?” Gemini says, “No, the correct order is Sun, Earth, Saturn.” Correct! But in the actual (again, written) prompt, the question is “Is this the right order? Consider the distance from the sun and explain your reasoning.”

Image Credits: Google

Did Gemini get it right? Or did it get it wrong and needed a bit of help to produce an answer they could put in a video? Did it even recognize the planets, or did it need help there as well?

In the video, a ball of paper gets swapped around under a cup, which the model instantly and seemingly intuitively detects and tracks. In the post, not only does the activity have to be explained, but also the model must be trained (if quickly and using natural language) to perform it. And so on.

These examples may or may not seem trivial to you. After all, recognizing hand gestures as a game so quickly is actually really impressive for a multimodal model! So is making a judgment call on whether a half-finished picture is a duck or not! Although now, since the blog post lacks an explanation for the duck sequence, I’m beginning to doubt the veracity of that interaction as well.

Now, if the video had said at the start, “This is a stylized representation of interactions our researchers tested,” no one would have batted an eye — we kind of expect videos like this to be half factual, half aspirational.

But the video is called “Hands-on with Gemini” and when they say it shows “our favorite interactions,” it implies that the interactions we see are those interactions. They were not. Sometimes they were more involved; sometimes they were totally different; sometimes they don’t really appear to have happened at all. We’re not even told what model it is — the Gemini Pro one people can use now, or (more likely) the Ultra version slated for release next year?

Should we have assumed that Google was only giving us a flavor video when they described it the way they did? Perhaps then we should assume all capabilities in Google AI demos are being exaggerated for effect. I write in the headline that this video was “faked.” At first I wasn’t sure if this harsh language was justified (certainly Google doesn’t think so; a spokesperson asked me to change it). But despite including some real parts, the video simply does not reflect reality. It’s fake.

Google says that the video “shows real outputs from Gemini,” which is true, and that “we made a few edits to the demo (we’ve been upfront and transparent about this),” which isn’t. It isn’t a demo — not really — and the video shows very different interactions from those created to inform it.

Update: In a social media post made after this article was published, Google DeepMind’s VP of Research Oriol Vinyals showed a bit more of how “Gemini was used to create” the video. “The video illustrates what the multimodal user experiences built with Gemini could look like. We made it to inspire developers.” (Emphasis mine.) Interestingly, it shows a pre-prompting sequence that lets Gemini answer the planets question without the sun hinting (though it does tell Gemini it’s an expert on planets and to consider the sequence of objects pictured).

Perhaps I will eat crow when, next week, the AI Studio with Gemini Pro is made available to experiment with. And Gemini may well develop into a powerful AI platform that genuinely rivals OpenAI and others. But what Google has done here is poison the well. How can anyone trust the company when they claim their model does something now? They were already limping behind the competition. Google may have just shot itself in the other foot.

More TechCrunch

Microsoft announced on Tuesday during its annual Build conference that it’s bringing “Windows Volumetric Apps” to Meta Quest headsets. The partnership will allow Microsoft to bring Windows 365 and local…

Microsoft’s new ‘Volumetric Apps’ for Quest headsets extend Windows apps into the 3D space

The spam reached Bluesky by first crossing over two other decentralized networks: Mastodon and Nostr.

The ‘vote Trump’ spam that hit Bluesky in May came from decentralized rival Nostr

Welcome to TechCrunch Fintech! This week, we’re looking at the continued fallout from Synapse’s bankruptcy, how Layer wants to disrupt SMB accounting, and much more! To get a roundup of…

There’s a real appetite for a fintech alternative to QuickBooks

The company is hoping to produce electricity at $13 per megawatt hour, which would be more than 50% cheaper than traditional onshore wind.

Bill Gates-backed wind startup AirLoom is raising $12M, filings reveal

Generative AI makes stuff up. It can be biased. Sometimes it spits out toxic text. So can it be “safe”? Rick Caccia, the CEO of WitnessAI, believes it can. “Securing…

WitnessAI is building guardrails for generative AI models

It’s not often that you hear about a seed round above $10 million. H, a startup based in Paris and previously known as Holistic AI, has announced a $220 million…

French AI startup H raises $220M seed round

Hey there, Series A to B startups with $35 million or less in funding — we’ve got an exciting opportunity that’s tailor-made for your growth journey! If you’re looking to…

Boost your startup’s growth with a ScaleUp package at TC Disrupt 2024

TikTok is pulling out all the stops to prevent its impending ban in the United States. Aside from initiating legal action against the U.S. government, that means shaping up its…

As a US ban looms, TikTok announces a $1M program for socially driven creators

Microsoft wants to put its Copilot everywhere. It’s only a matter of time before Microsoft renames its annual Build developer conference to Microsoft Copilot. Hopefully, some of those upcoming events…

Microsoft’s Power Automate no-code platform adds AI flows

Build is Microsoft’s largest developer conference and of course, it’s all about AI this year. So it’s no surprise that GitHub’s Copilot, GitHub’s “AI pair programming tool,” is taking center…

GitHub Copilot gets extensions

Microsoft wants to make its brand of generative AI more useful for teams — specifically teams across corporations and large enterprise organizations. This morning at its annual Build dev conference,…

Microsoft intros a Copilot for teams

Microsoft’s big focus at this year’s Build conference is generative AI. And to that end, the tech giant announced a series of updates to its platforms for building generative AI-powered…

Microsoft upgrades its AI app-building platforms

The U.K.’s data protection watchdog has closed an almost year-long investigation of Snap’s AI chatbot, My AI — saying it’s satisfied the social media firm has addressed concerns about risks…

UK data protection watchdog ends privacy probe of Snap’s GenAI chatbot, but warns industry

U.S. cell carrier Patriot Mobile experienced a data breach that included subscribers’ personal information, including full names, email addresses, home ZIP codes and account PINs, TechCrunch has learned. Patriot Mobile,…

Conservative cell carrier Patriot Mobile hit by data breach

It’s been three years since Spotify acquired live audio startup Betty Labs, and yet the music streaming service isn’t leveraging the technology to its fullest potential — at least not…

Spotify’s ‘Listening Party’ feature falls short of expectations

Alchemist Accelerator has a new pile of AI-forward companies demoing their wares today, if you care to watch, and the program itself is making some international moves into Tokyo and…

Alchemist’s latest batch puts AI to work as accelerator expands to Tokyo, Doha

“Late Pledge” allows campaign creators to continue collecting money even after the campaign has closed.

Kickstarter now lets you pledge after a campaign closes

Stack AI’s co-founders, Antoni Rosinol and Bernardo Aceituno, were PhD students at MIT wrapping up their degrees in 2022 just as large language models were becoming more mainstream. ChatGPT would…

Stack AI wants to make it easier to build AI-fueled workflows

Pinecone, the vector database startup founded by Edo Liberty, the former head of Amazon’s AI Labs, has long been at the forefront of helping businesses augment large language models (LLMs)…

Pinecone launches its serverless vector database out of preview

Young geothermal energy wells can be like budding prodigies, each brimming with potential to outshine their peers. But like people, most decline with age. In California, for example, the amount…

Special mud helps XGS Energy get more power out of geothermal wells

Featured Article

Sonos finally made some headphones

The market play is clear from the outset: The $449 headphones are firmly targeted at an audience that would otherwise be purchasing the Bose QC Ultra or Apple AirPods Max.

6 hours ago
Sonos finally made some headphones

Adobe says the feature is up to the task, regardless of how complex of a background the object is set against.

Adobe brings Firefly AI-powered Generative Remove to Lightroom

All cars suffer when the mercury drops, but electric vehicles suffer more than most as heaters draw more power and batteries charge more slowly as the liquid electrolyte inside thickens.…

Porsche Ventures invests in battery startup South 8 to boost cold-weather EV performance

Scale AI has raised a $1 billion Series F round from a slew of big-name institutional and corporate investors including Amazon and Meta.

Data-labeling startup Scale AI raises $1B as valuation doubles to $13.8B

The new coalition, Tech Against Scams, will work together to find ways to fight back against the tools used by scammers and to better educate the public against financial scams.

Meta, Match, Coinbase and others team up to fight online fraud and crypto scams

It’s a wrap: European Union lawmakers have given the final approval to set up the bloc’s flagship, risk-based regulations for artificial intelligence.

EU Council gives final nod to set up risk-based regulations for AI

London-based fintech Vitesse has closed a $93 million Series C round of funding led by investment giant KKR.

Vitesse, a payments and treasury management platform for insurers, raises $93M to fuel US expansion

Zen Educate, an online marketplace that connects schools with teachers, has raised $37 million in a Series B round of funding. The raise comes amid a growing teacher shortage crisis…

Zen Educate raises $37M and acquires Aquinas Education as it tries to address the teacher shortage

“When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine.”

Scarlett Johansson says that OpenAI approached her to use her voice

A new self-driving truck — manufactured by Volvo and loaded with autonomous vehicle tech developed by Aurora Innovation — could be on public highways as early as this summer.  The…

Aurora and Volvo unveil self-driving truck designed for a driverless future