Google Releases DiffusionGemma to Challenge How AI Text Is Generated

June 11, 2026

Google has introduced DiffusionGemma, an experimental open model that applies diffusion-based techniques to text generation rather than relying on the token-by-token approach used by traditional large language models. Released under an Apache 2.0 license, the 26-billion-parameter Mixture of Experts model is designed to generate text significantly faster on GPUs while targeting researchers and developers exploring latency-sensitive AI applications.

The model builds on Google’s Gemma 4 family and incorporates research from Gemini Diffusion. Unlike conventional autoregressive models that generate one token at a time, DiffusionGemma produces entire blocks of text simultaneously. Google says the approach can deliver up to four times faster text generation on GPUs, with performance exceeding 1,000 tokens per second on a single NVIDIA H100 and more than 700 tokens per second on an NVIDIA GeForce RTX 5090.

DiffusionGemma is aimed at developers working on interactive workflows where response speed is critical. Google highlighted use cases including in-line editing, rapid iteration, code infilling, mathematical graphs, and amino acid sequence generation. The company said the model’s architecture allows 256 tokens to be generated in parallel, giving each token visibility into the broader context of the text being created.

The model operates as a 26B Mixture of Experts system but activates only 3.8 billion parameters during inference. According to Google, that enables it to run within 18GB of VRAM when quantized, making it accessible on high-end consumer hardware.

Another key feature is the model’s ability to iteratively refine its own output. Rather than generating text in a single pass, DiffusionGemma repeatedly evaluates and improves an entire block of text, allowing it to correct errors during the generation process.

Google emphasized that the model remains experimental and is not intended to replace standard Gemma 4 deployments. The company said output quality remains lower than its autoregressive counterparts and recommends Gemma 4 for applications where generation quality is the primary consideration.

The release also serves as a practical test of diffusion-based text generation, an area that has attracted research interest for years but has been difficult to scale effectively. Google argues that diffusion changes how AI workloads utilize hardware by processing larger chunks of text simultaneously instead of advancing word by word.

The process resembles image-generation diffusion models. DiffusionGemma begins with a sequence of placeholder tokens, then performs multiple refinement passes before converging on a final output. Because the model evaluates larger sections of text at once, Google says it can handle certain non-linear generation tasks more effectively than traditional language models.

Google pointed to fine-tuning as a way to improve performance for specialized workloads. In one example, AI startup Unsloth fine-tuned DiffusionGemma to solve Sudoku puzzles, a task Google said benefits from the model’s bi-directional attention mechanism.

The company is making the model available through Hugging Face and supporting deployment across a range of tools, including MLX, vLLM, Hugging Face Transformers, Unsloth, and NVIDIA NeMo. Support for llama.cpp is expected to arrive later. Google also said it worked with NVIDIA on optimizations for both consumer and enterprise hardware platforms, including GeForce RTX GPUs and NVIDIA’s Hopper and Blackwell systems.

While the model’s performance advantages are most apparent in local and low-concurrency environments, Google said the benefits become less pronounced in high-volume cloud deployments where traditional autoregressive models can more efficiently utilize hardware resources through batching.

This analysis is based on reporting from Google.

Images courtesy of Google.

This article was generated with AI assistance and reviewed for accuracy and quality.

Last updated: June 12, 2026

Report Error

About this article: This article was generated with AI assistance and reviewed by our editorial team to ensure it follows our editorial standards for accuracy and independence. We maintain strict fact-checking protocols and cite all sources.

Word count: 569Reading time: 0 minutes

Explore More AI Resources

Continue with high-value guides related to this topic.

Compare AI Models

See ChatGPT, Claude, and Gemini side-by-side in one place.

Best AI Newsletters

Find top AI newsletters and subscribe to ChatAI Daily.

AI FAQ

Quick answers about ChatAI, AI tools, and multi-model chat.

AI Tools

Use free AI tools for summarization, translation, and more.

📧 Stay Updated

Get the latest AI news delivered to your inbox every morning.

Continue Reading

Runway Launches Media Router to Automatically Choose the Best AI Model for Every Request

Runway has launched Media Router through its Runway Dev platform, introducing a system that automatically selects image, video, and audio generation models based on developer-defined priorities such...

July 23, 2026•5 min read

NVIDIA's New Spectrum-6 Ethernet Switch Targets Gigascale AI Data Centers

NVIDIA has introduced Spectrum-6, a 102.4-terabit-per-second Ethernet switch system designed for the company's next generation of AI infrastructure. Built as part of the NVIDIA Vera Rubin platform,...

July 21, 2026•5 min read

Google's New Frozen v2 AI Chip Could Dramatically Boost Gemini Efficiency

Alphabet is developing a new custom AI server chip designed to improve the efficiency of its Gemini models, according to a report from The Information. The chip, internally known as "Frozen v2," is...

July 20, 2026•5 min read

Explore All Articles

Google Releases DiffusionGemma to Challenge How AI Text Is Generated

Explore More AI Resources

Compare AI Models

Best AI Newsletters

AI FAQ

AI Tools

AI Tools for this Article

📧 Stay Updated

Related Articles

Runway Launches Media Router to Automatically Choose the Best AI Model for Every Request

NVIDIA's New Spectrum-6 Ethernet Switch Targets Gigascale AI Data Centers

Google's New Frozen v2 AI Chip Could Dramatically Boost Gemini Efficiency

Continue Reading

Runway Launches Media Router to Automatically Choose the Best AI Model for Every Request

NVIDIA's New Spectrum-6 Ethernet Switch Targets Gigascale AI Data Centers

Google's New Frozen v2 AI Chip Could Dramatically Boost Gemini Efficiency

Stay Ahead of AI

Go Premium

Follow Our Community

ChatAI

Go Premium

ChatAI

Follow Our Community