Google's Gemini Avatar Clones You in Minutes, Raising Stakes for AI Video Creation

Google has rolled out a feature that lets subscribers generate digital versions of themselves for use in AI-produced videos. The tool, powered by the company’s new Gemini Omni model, captures facial details, head movements and voice patterns after a short recording session. But the realism comes with questions about daily use, potential misuse and the line between convenience and discomfort.

Users open the Gemini app or head to gemini.google.com. They tap into settings, select Avatar and begin a guided process. The phone camera points at the face. A series of two-digit numbers appears on screen. Speak them clearly. Then look straight ahead, turn the head right, then left. The entire capture takes less than two minutes in many cases. A thumbnail photo confirms success. From there the likeness sits ready for prompts.

And it works. Android Authority tested the feature shortly after wider availability began. The resulting clone resembled the tester. Its voice carried similar tone and cadence. One prompt read: “create a video of me wearing a t shirt with the Android Authority logo. Show me at the Google campus hanging out with the different Android figurines.” The output placed the avatar in that scene. Movements looked natural enough that family members might not immediately spot the artifice. Yet subtle cues gave it away to trained eyes. The author called it impressive but admitted he probably wouldn’t reach for it often. He also described it as interesting. And a bit creepy.

Chrome Unboxed went further. Senior editor Robby Payne completed setup in roughly two minutes. Video generation followed soon after. He fed the prompt “@robby.n.payne Tell me about how easy it is to make a Gemini Avatar. Also, make my background a penthouse view of the Chicago skyline.” The 10-second clip emerged with convincing facial tracking, micro-expressions and lip sync. The author labeled the experience an absolute trip. Slightly unsettling. Yet a massive milestone for personal content creation.

WIRED writer Reece Rogers sat in a well-lit room and followed the same steps. Five minutes later he possessed his own digital stand-in. He generated clips of the avatar singing happy birthday to a dinosaur at Dolores Park. The result felt hyper-realistic. “Unnervingly me,” he wrote. The clone appeared ready for any scene, any script. Rogers felt creeped out despite minor stutters in the output. The experience left him questioning how often such clones would appear in future media.

Google introduced the capability alongside Gemini Omni at its I/O event. The model handles video from multiple inputs. Text prompts shape scenes. Existing images or audio can influence style. Once created, the avatar slots into those generations via simple commands like @me or the user’s name. Google’s official blog positioned the update as part of a broader push toward proactive, agentic assistance in the Gemini app. Features vary by subscription tier and region. Access requires a Google AI Plus, Pro or Ultra plan. Users must be 18 or older. The account owner must perform the scan in person.

TechCrunch reported that Google added safeguards. Onboarding demands the user read numbers aloud while the camera records. This step aims to tie the likeness to a real individual and deter casual deepfake attempts. Every generated video carries an invisible SynthID watermark developed by Google DeepMind. The mark allows verification of AI origin. Nicole Brichtova, director of product management at Google DeepMind, told TechCrunch the process focuses on preventing harm without blocking benign creativity.

Yet limits surface quickly. One Google AI Pro subscriber told Android Authority he exhausted his five-hour usage quota with a single failed video prompt that incorporated the avatar. The system consumed the full allowance in minutes before the clip even completed. Such constraints highlight current infrastructure costs for high-fidelity video synthesis. They also suggest the feature remains resource-intensive even after months of refinement.

The rollout followed earlier hints. Android Authority first spotted references to avatars and “likeness” in APK teardowns back in March. Code mentioned inserting a 3D representation of the user into generative content. Prompts could call @me to pull the saved model. By late May the wider release began. Google posted step-by-step instructions on X. “Easily add yourself to your video creations in Gemini,” the official @GeminiApp account wrote. The thread directed users to the support page for details.

Comparisons to past efforts feel inevitable. Apple offers Personas for Vision Pro that recreate facial expressions during calls. Samsung provides Likeness for its XR headset. OpenAI experimented with similar cameo-style avatars in its short-lived Sora app before shuttering consumer access. Google’s version stands out because it ties directly to a general-purpose creative studio inside the Gemini app. Users move from avatar creation to video editing without switching tools. They can remix backgrounds, adjust camera angles or alter wardrobe through conversational follow-ups.

Business applications appear obvious. Marketers could produce personalized explainer videos at scale. Educators might insert themselves into animated lessons. Companies already testing Google Vids or Flow could layer custom avatars atop scripted content. Yet the personal dimension dominates early coverage. Creators talk about inserting themselves into fantasy scenes or generating consistent on-camera presence without daily filming. The barrier collapses. A few taps replace hours in front of lights and lenses.

Concerns persist. Hyper-realistic clones could fuel misinformation if watermarks are stripped or ignored. Voice replication raises consent issues for family members or colleagues whose speech patterns might be approximated. Google insists the feature stays locked to the creating account. No public sharing of raw avatar models exists yet. Still, once a convincing video circulates, provenance becomes harder to prove without forensic tools.

Early testers note imperfections. Eyes sometimes betray the synthetic origin. Lighting mismatches occur when the prompt shifts environments dramatically. Lip sync slips during rapid speech. These artifacts remind viewers they watch generated footage. But the gap narrows with each update. Gemini Omni Flash delivers faster iterations for short-form platforms like YouTube Shorts. Longer, more polished clips remain possible for subscribers willing to wait.

So what happens next? The technology sits at an inflection point. Consumer adoption will test Google’s guardrails. If usage stays playful, the feature could normalize personal AI actors in everyday communication. Holiday greetings. Professional updates. Virtual presentations. But if bad actors find workarounds, pressure will mount for stricter identity verification or outright limits on facial replication.

Google continues to iterate. Recent app updates emphasize agentic behaviors. Gemini Spark acts as a 24/7 helper that can reference the user’s avatar in proactive briefings. The company frames the entire stack as an evolution of the assistant experience rather than a standalone video toy. That integration may determine success. Standalone novelty fades. Embedded utility endures.

For now the reaction splits. Some call it mind-blowing. Others find it eerie. Most agree the output exceeds expectations for a mobile-first implementation. The hands-on reports from Chrome Unboxed, WIRED and Android Authority paint a consistent picture. Creation is simple. Results impress. Discomfort lingers. And the videos keep coming.

Whether this becomes a daily tool or occasional experiment depends on individual comfort with seeing a perfect digital twin deliver lines that were never spoken. The technology has arrived. The conversation about its place in daily life has only begun.

Google’s Gemini Avatar Clones You in Minutes, Raising Stakes for AI Video Creation

Notice an error?

Ready to get started?