Gemma 4: Four Open Models for Local AI, Now Under Apache 2.0

What Google announced

Gemini keeps getting stronger, but you only run it on Google’s terms. Gemma is Google’s open-weight line- weights you can download and run yourself- and it has been the practical option for teams that need local or self-hosted AI. Gemma 3 is over a year old now. Google is replacing that generation with Gemma 4: four model sizes tuned for local use, plus a licensing change that addresses what developers have been complaining about for years.

Starting now, builders can work with Gemma 4 in Google’s tooling and pull weights from the usual hubs. The headline legal move: Google is dropping its custom Gemma license and releasing Gemma 4 under Apache 2.0.

Short version: Gemma 4 is a family of four open-weight models (two large for strong GPUs, two “effective” small ones for phones and edge). Same technical lineage as closed Gemini 3-class models, but runnable offline if you have the hardware. License is now standard Apache 2.0 instead of Google’s older, heavier terms.

Context: What Is AI Studio? Google’s AI Playground Explained → · Google Cloud AI: Tools and When to Use Them →

The four Gemma 4 sizes

Model	Role	Notes
26B MoE	Fast local inference on serious GPUs	Mixture of Experts; only ~3.8B parameters active per forward pass — higher tokens/sec than many similarly sized dense models.
31B Dense	Quality-first local work	Slower than MoE; Google expects developers to fine-tune it for specific domains.
E2B (Effective 2B)	Mobile / edge	Low memory and battery use; optimized with Qualcomm and MediaTek for phones, Raspberry Pi, Jetson Nano.
E4B (Effective 4B)	Stronger edge / on-device	Same positioning as E2B with more capacity; Google cites near-zero latency vs Gemma 3 for these tiers.

26B MoE and 31B dense: what “local” actually costs

Google targets unquantized bfloat16 on a single 80GB Nvidia H100 for both large variants — that is still “local” in the data-center sense, not a laptop. An H100 is serious money. The point is: quantize and the same architectures shrink onto consumer GPUs that normal builders actually own.

Google also stresses lower latency so on-prem inference feels usable. The MoE design is the speed play; the 31B dense is the “make it good, then adapt it” play.

E2B and E4B: Pixel-class and maker boards

These are the models meant to run where power and RAM hurt. Google says the Pixel team collaborated with chip vendors so E2B/E4B behave well on real phone silicon, not just slides. Less memory, less battery drain than Gemma 3 on comparable tasks, with latency Google describes as nearly instant for typical on-device loops.

Chart comparing AI models: Elo score versus total parameters on log scale, highlighting Gemma 4 thinking variants in blue.

Capabilities: reasoning, agents, code, vision, speech

Google claims Gemma 4 clears Gemma 3 across the board and competes near the top of open-model leaderboards — citing, for example, Gemma 31B placing around third on an Arena-style open ranking, behind larger names like GLM-5 and Kimi 2.5, while staying much smaller and therefore cheaper to host yourself.

Under the hood, Google aligns Gemma 4 with the same family as Gemini 3 (closed): stronger reasoning, math, and instruction-following. The product story in 2026 is agentic workflows, so Gemma 4 adds native function calling, structured JSON output, and instructions aimed at common tools and APIs.

Code: Google positions Gemma 4 as a way to get high-quality generation offline if you can run the big checkpoints — a different trade than always calling Gemini Pro or cloud coding agents.

Vision: Improved multimodal handling for things like OCR and chart reading on local hardware.

Speech: E2B and E4B include native speech recognition; Gemma 3 had speech features, but Google is signaling a quality step for Gemma 4.

Context windows and languages

Gemma 4 supports 140+ languages. Context lengths: 128k tokens on the edge models (E2B / E4B), 256k on the 26B and 31B variants. That is strong for self-hosted models; it is still well short of cloud Gemini’s 1M token class of context.

Why Apache 2.0 is the real story for some teams

Older Gemma releases shipped under a custom Google license. Developers flagged problems: a prohibited-use list Google could change unilaterally, expectations that you police downstream projects, and language some read as affecting other models trained on synthetic data from Gemma. Whether every reading held up in court mattered less than the risk perception: legal ambiguity kills enterprise adoption.

Apache 2.0 is boring in the best way: widely understood, permissive for commercial use, no surprise one-sided updates. Google’s bet is that predictable terms grow the ecosystem they keep calling the Gemmaverse — more ports, more fine-tunes, more products that never phone home to Mountain View.

Background: What Is Generative AI? A Beginner’s Guide →

Gemini Nano 4: confirmed for Pixel

On-device Android AI under the Gemini Nano name (scam warnings, summaries, call recap without sending audio to the cloud) has always been related to Gemma. Google now states explicitly that the next-gen Nano — Gemini Nano 4 — will use 2B and 4B variants derived from Gemma 4 E2B and E4B. Today’s Pixel stack runs Nano 3 tied to Gemma 3n.

Developers can prototype with E2B/E4B in the latest AI Core Developer Preview; Google says those designs should be forward-compatible when Nano 4 ships. More detail will likely land at Google I/O in the coming weeks.

Where to try and download

AI Studio — larger checkpoints: 31B and 26B MoE.
AI Edge Gallery — E4B and E2B.
Weights: full downloads via Hugging Face, Kaggle, and Ollama.
Cloud: Google will also host runs on Google Cloud for a fee if you want managed inference instead of your own GPUs.

Google’s announcement (video)

Official walkthrough and positioning from Google:

Watch on YouTube →

Bottom line

Gemma 4 is Google’s attempt to own the open local tier the same way Gemini owns the cloud: four clear SKUs from H100-class down to phone-class, agent-ready APIs, and a license developers do not have to parse with a lawyer on speed dial. Whether leaderboard claims and “near-zero latency” hold up is for independent benchmarks — but Apache 2.0 alone will pull teams back who walked away from Gemma 3 over terms.

What are You Looking For?

Gemma 4: Four Open Models for Local AI, Now Under Apache 2.0

What Google announced

The four Gemma 4 sizes

26B MoE and 31B dense: what “local” actually costs

E2B and E4B: Pixel-class and maker boards

Capabilities: reasoning, agents, code, vision, speech

Context windows and languages

Why Apache 2.0 is the real story for some teams

Gemini Nano 4: confirmed for Pixel

Where to try and download

Google’s announcement (video)

Bottom line

Marshall Emberton II Review (2026): Still Worth It vs Emberton III?

How to Install and Run Gemma 4 Locally for Web Apps

Leave a Comment Cancel

Read Next

How to Install and Run Gemma 4 Locally for Web Apps

How to Build an AI Agent with Langflow- Complete Guide & Review

7 AI Terms You Actually Need to Understand

Gemma 4: Four Open Models for Local AI, Now Under Apache 2.0

What Google announced

The four Gemma 4 sizes

26B MoE and 31B dense: what “local” actually costs

E2B and E4B: Pixel-class and maker boards

Capabilities: reasoning, agents, code, vision, speech

Context windows and languages

Why Apache 2.0 is the real story for some teams

Gemini Nano 4: confirmed for Pixel

Where to try and download

Google’s announcement (video)

Bottom line

Marshall Emberton II Review (2026): Still Worth It vs Emberton III?

How to Install and Run Gemma 4 Locally for Web Apps

Leave a Comment Cancel

Read Next

How to Install and Run Gemma 4 Locally for Web Apps

How to Build an AI Agent with Langflow- Complete Guide & Review

7 AI Terms You Actually Need to Understand

Subscribe to our Newsletter