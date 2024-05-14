AI

Google reveals plans for upgrading AI in the real world through Gemini Live at Google I/O 2024

Kyle Wiggers

Comment

Gemini
Image Credits: Google

Google is improving its AI-powered chatbot Gemini so that it can better understand the world around it — and the people conversing with it.

At the Google I/O 2024 developer conference on Tuesday, the company previewed a new experience in Gemini called Gemini Live, which lets users have “in-depth” voice chats with Gemini on their smartphones. Users can interrupt Gemini while the chatbot’s speaking to ask clarifying questions, and it’ll adapt to their speech patterns in real time. And Gemini can see and respond to users’ surroundings, either via photos or video captured by their smartphones’ cameras.

“With Live, Gemini can better understand you,” Sissie Hsiao, GM for Gemini experiences at Google, said during a press briefing. “It’s custom-tuned to be intuitive and have a back-and-forth, actual conversation with [the underlying AI] model.”

Gemini Live is in some ways the evolution of Google Lens, Google’s long-standing computer vision platform to analyze images and videos, and Google Assistant, Google’s AI-powered, speech-generating and -recognizing virtual assistant across phones, smart speakers and TVs.

At first glance, Live doesn’t seem like a drastic upgrade over existing tech. But Google claims it taps newer techniques from the generative AI field to deliver superior, less error-prone image analysis — and combines these techniques with an enhanced speech engine for more consistent, emotionally expressive and realistic multi-turn dialogue.

“It’s a real-time voice interface and [has] extremely powerful multimodal capabilities combined with long context,” Oriol Vinyals, principal scientist at DeepMind, Google’s AI research division, told TechCrunch in an interview. “You could imagine how that combination will feel very powerful.”

The technical innovations driving Live stem in part from Project Astra, a new initiative within DeepMind to create AI-powered apps and “agents” for real-time, multimodal understanding.

“We’ve always wanted to build a universal agent that will be useful in everyday life,” Demis Hassabis, CEO of DeepMind, said during the briefing. “Imagine agents that can see and hear what we do, better understand the context we’re in and respond quickly in conversation, making the pace and quality of interactions feel much more natural.”

Gemini Live — which won’t launch until later this year — can answer questions about things within view (or recently within view) of a smartphone’s camera, like which neighborhood a user might be in or the name of a part on a broken bicycle. Pointed at a portion of computer code, Live can explain what that code does. Or, asked about where a pair of glasses might be, Live can say where it last “saw” the glasses.

Gemini
Image Credits: Google

Live is also designed to serve as a virtual coach of sorts, helping users rehearse for events, brainstorm ideas and so on. Live can suggest which skills to highlight in an upcoming job or internship interview, for instance, or give public speaking advice.

“Gemini Live can provide information more succinctly and answer more conversationally than, for example, if you’re interacting in just text,” Sissie said. “We think that an AI assistant should be able to solve complex problems … and also feel very natural and fluid when you engage with it.”

Gemini Live’s ability to “remember” is made possible by the architecture of the model underpinning it: Gemini 1.5 Pro (and to a lesser extent other “task-specific” generative models), which is the current flagship in Google’s Gemini family of generative AI models. It has a longer-than-average context window, meaning it can take in and reason over a lot of data — about an hour of video (RIP, smartphone batteries) — before crafting a response.

“That’s hours of video that you could have interacting with the model, and it would remember all that has happened before,” Vinyals said.

Live is reminiscent of the generative AI behind Meta’s Ray-Ban glasses, which similarly can look at images captured by a camera and interpret them in near-real time. Judging from the pre-recorded demo reels Google showed during the briefing, it’s also quite similar — conspicuously so — to OpenAI’s recently revamped ChatGPT.

One key difference between the new ChatGPT and Gemini Live is that Gemini Live won’t be free. Once it launches, Live will be exclusive to Gemini Advanced, a more sophisticated version of Gemini that’s gated behind the Google One AI Premium Plan, priced at $20 per month.

Perhaps in a jab at Meta, one of Google’s demos showed a person wearing AR glasses equipped with a Gemini Live-like app. Google — doubtless keen to avoid another dud in the eyewear department — declined to say whether those glasses or any glasses powered by its generative AI would come to market in the near future.

Vinyals didn’t completely shut down the idea, though. “We’re still prototyping and, of course, showcasing [Astra and Gemini Live] to the world,” he said. “We’re seeing the reaction from folks that can try it, and that will inform where we go.”

Other Gemini updates

Beyond Live, Gemini is getting a range of upgrades to make it more useful day-to-day.

Gemini Advanced users in more than 150 countries and over 35 languages can take advantage of Gemini 1.5 Pro’s larger context to have the chatbot analyze, summarize and answer questions about long (up to 1,500 pages) documents. (While Live is arriving later in the year, Gemini Advanced users can interact with Gemini 1.5 Pro starting today.) Documents can now be imported from Google Drive or uploaded directly from a mobile device.

Later this year for Gemini Advanced users, the context window will grow even larger — to 2 million tokens — and bring with it support for uploading videos (up to two hours in length) to Gemini and having Gemini analyze big codebases (more than 30,000 lines of code). 

Google claims that the large context window will improve Gemini’s image understanding. For example, given a photo of a fish dish, Gemini will be able to suggest a comparable recipe. Or, given a math problem, Gemini will provide step-by-step instructions on how to solve it. 

And it’ll help Gemini to trip plan. 

Gemini
Image Credits: Google

In the coming months, Gemini Advanced will gain a new “planning experience” that creates custom travel itineraries from prompts. Taking into account things like flight times (from emails in a user’s Gmail inbox), meal preferences and information about local attractions (from Google Search and Maps data), as well as the distances between those attractions, Gemini will generate an itinerary that updates automatically to reflect any changes. 

In the more immediate future, Gemini Advanced users will be able to create Gems, custom chatbots powered by Google’s Gemini models. Along the lines of OpenAI’s GPTs, Gems can be generated from natural language descriptions — for example, “You’re my running coach. Give me a daily running plan” — and shared with others or kept private. No word on whether Google plans to launch a storefront for Gems like OpenAI’s GPT Store; hopefully we’ll learn more as I/O goes on.

Soon, Gems and Gemini proper will be able to tap an expanded set of integrations with Google services, including Google Calendar, Tasks, Keep and YouTube Music, to complete various labor-saving tasks.

Gemini
Image Credits: Google

“Let’s say you have a flier from your kid’s school, and there’s all these events that you want to add to your personal calendar,” Hsiao said. “You’ll be able to take a picture of this flier and ask the Gemini app to create these calendar entries directly onto your calendar. This is going to be a great time saver.”

Given generative AI’s tendency to get summaries wrong and generally go off the rails (plus Gemini’s not-so-glowing early reviews), take Google’s claims with a grain of salt. But if the improved Gemini and Gemini Advanced actually perform as Hsiao describes — and that’s a big if — they could be great time savers indeed. 

We’re launching an AI newsletter! Sign up here to start receiving it in your inboxes on June 5.

Read more about Google I/O 2024 on TechCrunch

More TechCrunch

Tags

, , , , ,
Hardware

Google I/O 2024: Everything announced so far

Christine Hall
12 mins ago

Here are quick hits of the biggest news from the keynote as they are announced.

Google I/O 2024: Everything announced so far
Apps

Google Play preps a new full-screen app discovery feature and adds more developer tools

Sarah Perez
20 mins ago

Google Play has a new discovery feature for apps, new ways to acquire users, updates to Play Points, and other enhancements to developer-facing tools.

Google Play preps a new full-screen app discovery feature and adds more developer tools
AI

Gemini on Android becomes more capable and works with Gmail, Messages, YouTube and more

Sarah Perez
20 mins ago

Soon, Android users will be able to drag and drop AI-generated images directly into their Gmail, Google Messages and other apps.

Gemini on Android becomes more capable and works with Gmail, Messages, YouTube and more
AI

Google gets serious about AI-generated video at Google I/O 2024

Kyle Wiggers
23 mins ago

Google’s gunning for OpenAI’s Sora with Veo, an AI model that can create 1080p video clips around a minute long given a text prompt.  Unveiled on Tuesday at Google’s I/O 2024 developer…

Google gets serious about AI-generated video at Google I/O 2024
AI

Gemini comes to Gmail to summarize, draft emails, and more

Sarah Perez
25 mins ago

In addition to the body of the emails themselves, the feature will also be able to analyze attachments, like PDFs.

Gemini comes to Gmail to summarize, draft emails, and more
AI

Google is bringing Gemini capabilities to Google Maps Platform

Aisha Malik
25 mins ago

The summaries are created based on Gemini’s analysis of insights from Google Maps’ community of more than 300 million contributors.

Google is bringing Gemini capabilities to Google Maps Platform
AI

Project IDX, Google’s next-gen IDE, is now in open beta

Frederic Lardinois
26 mins ago

Google says that over 100,000 developers already tried the service.

Project IDX, Google’s next-gen IDE, is now in open beta
AI

Google will use Gemini to detect scams during calls

Brian Heater
28 mins ago

The system effectively listens for “conversation patterns commonly associated with scams” in-real time. 

Google will use Gemini to detect scams during calls
AI

Google announces Gemma 2, a 27B-parameter version of its open model, launching in June

Frederic Lardinois
29 mins ago

The standard Gemma models were only available in 2 billion and 7 billion parameter versions, making this quite a step up.

Google announces Gemma 2, a 27B-parameter version of its open model, launching in June
AI

Google TalkBack will use Gemini to describe images for blind people

Brian Heater
29 mins ago

This is a great example of a company using generative AI to open its software to more users.

Google TalkBack will use Gemini to describe images for blind people
AI

Google launches Firebase Genkit, a new open source framework for building AI-powered apps

Frederic Lardinois
29 mins ago

Firebase Genkit is an open source framework that enables developers to quickly build AI into new and existing applications.

Google launches Firebase Genkit, a new open source framework for building AI-powered apps
AI

Google is building its Gemini Nano AI model into Chrome on the desktop

Frederic Lardinois
30 mins ago

This will enable developers to use the on-device model to power their own AI features.

Google is building its Gemini Nano AI model into Chrome on the desktop
AI

Circle to Search is now a better homework helper

Sarah Perez
31 mins ago

Google’s Circle to Search feature will now be able to solve more complex problems across psychics and math word problems. 

Circle to Search is now a better homework helper
AI

Google experiments with using video to search, thanks to Gemini AI

Sarah Perez
31 mins ago

People can now search using a video they upload combined with a text query to get an AI overview of the answers they need.

Google experiments with using video to search, thanks to Gemini AI
AI

Google will soon start using GenAI to organize some search results pages

Frederic Lardinois
34 mins ago

A search results page based on generative AI as its ranking mechanism will have wide-reaching consequences for online publishers.

Google will soon start using GenAI to organize some search results pages
Apps

Google is adding more AI to its search results

Ivan Mehta
36 mins ago

Google has built a custom Gemini model for search to combine real-time information, Google’s ranking, long context and multimodal features.

Google is adding more AI to its search results
Enterprise

Google’s next-gen TPUs promise a 4.7x performance boost

Frederic Lardinois
36 mins ago

At its Google I/O developer conference, Google on Tuesday announced the next generation of its Tensor Processing Units (TPU) AI chips.

Google’s next-gen TPUs promise a 4.7x performance boost
AI

Google reveals plans for upgrading AI in the real world through Gemini Live at Google I/O 2024

Kyle Wiggers
38 mins ago

Google is upgrading Gemini, its AI-powered chatbot, with features aimed at making the experience more ambient and contextually useful.

Google reveals plans for upgrading AI in the real world through Gemini Live at Google I/O 2024
Image Credits: Google
AI

Google’s image-generating AI gets an upgrade

Kyle Wiggers
39 mins ago

Veo can generate few-seconds-long 1080p video clips given a text prompt.

Google’s image-generating AI gets an upgrade
AI

Google’s generative AI can now analyze hours of video

Kyle Wiggers
42 mins ago

At Google I/O, Google announced upgrades to Gemini 1.5 Pro, including a bigger context window. .

Google’s generative AI can now analyze hours of video
AI

Google Photos introduces an AI search feature, ‘Ask Photos’

Sarah Perez
60 mins ago

The AI upgrade will make finding the right content more intuitive and less of a manual search process.

Google Photos introduces an AI search feature, ‘Ask Photos’
Apps

Apple touts stopping $1.8B in App Store fraud last year in latest pitch to developers

Natasha Lomas
1 hour ago

Apple released new data about anti-fraud measures related to its operation of the iOS App Store on Tuesday morning, trumpeting a claim that it stopped over $7 billion in “potentially…

Apple touts stopping $1.8B in App Store fraud last year in latest pitch to developers
Apps

Expedia starts testing AI-powered features for search and travel planning

Ivan Mehta
1 hour ago

Online travel agency Expedia is testing an AI assistant that bolsters features like search, itinerary building, trip planning, and real-time travel updates.

Expedia starts testing AI-powered features for search and travel planning
Fintech

Inside TabaPay’s drama-filled decision to abandon its plans to buy Synapse’s assets

Mary Ann Azevedo
1 hour ago

Welcome to TechCrunch Fintech! This week, we look at the drama around TabaPay deciding to not buy Synapse’s assets, as well as stocks dropping for a couple of fintechs, Monzo raising…

Inside TabaPay’s drama-filled decision to abandon its plans to buy Synapse’s assets
Security

Threat actor scraped Dell support tickets, including customer phone numbers

Lorenzo Franceschi-Bicchierai
2 hours ago

The person who claimed to have stolen the physical addresses of 49 million Dell customers appears to have taken more data from a different Dell portal, TechCrunch has learned. The…

Threat actor scraped Dell support tickets, including customer phone numbers
Social

On Elon’s whim, X now treats ‘cisgender’ as a slur

Amanda Silberling
2 hours ago

If you write the words “cis” or “cisgender” on X, you might be served this full-screen message: “This post contains language that may be considered a slur by X and…

On Elon’s whim, X now treats ‘cisgender’ as a slur
AI

Google I/O 2024: Watch the AI reveals live

Brian Heater
2 hours ago

The keynote kicks off at 10 a.m. PT on Tuesday and will offer glimpses into the latest versions of Android, Wear OS and Android TV.

Google I/O 2024: Watch the AI reveals live
Social

Meta is shutting down Workplace, its enterprise communications business

Ingrid Lunden
3 hours ago

Facebook once had big ambitions to be a major player in enterprise communication and productivity, but today the social network’s parent company Meta will be closing a very significant chapter…

Meta is shutting down Workplace, its enterprise communications business
Social

Meta’s Oversight Board overturns takedown decision for Pakistan child abuse documentary

Jagmeet Singh
3 hours ago

The Oversight Board has overturned Meta’s decision to take down a documentary revealing the identities of child abuse victims in Pakistan.

Meta’s Oversight Board overturns takedown decision for Pakistan child abuse documentary
Enterprise

AWS CEO Adam Selipsky steps down

Kyle Wiggers
5 hours ago

Adam Selipsky is stepping down from his role as CEO of Amazon Web Services, Amazon has confirmed to TechCrunch.  In a memo shared internally by Amazon CEO Andy Jassy and…

AWS CEO Adam Selipsky steps down