How to Install and Run Gemma 4 Locally for Web Apps

Gemma 4 is Google’s open-weights model line you can run on your own hardware. This guide uses Ollama to install Gemma 4 quickly, then shows how a web frontend or backend talks to the model through Ollama’s local API at localhost:11434. The weights never need to live “inside” a static HTML file — a browser tab or server you control sends prompts to a machine that has the model loaded.

What this guide covers

You will install Ollama, pull a Gemma 4 tag, verify inference from the terminal, then call the same model from application code (with notes on CORS and backend proxies). Typical setups: your laptop, a team server, or a cloud GPU you operate — not a public page that expects the model to bundle into the browser bundle.

End state: Ollama serves Gemma 4 at http://localhost:11434. Your web stack (or a small proxy you add) sends JSON to that API and streams or displays the model’s text response.

Background: Google Gemma 4: Open Models and Apache 2.0 → · What Is AI Studio? →

Prerequisites and hardware

Operating system: macOS, Windows, or Linux (Ollama supports all three).
Disk space: several gigabytes per variant; larger tags need more (see below).
RAM / GPU: Ollama ships quantized builds so you can run smaller variants on consumer hardware. Bigger tags need more VRAM or unified memory; if inference is slow, pick gemma4:e2b or gemma4:e4b.
Network: required only to download the model the first time. After that, generation can run fully offline.

Google publishes approximate VRAM for weights only (BF16 / quantized) in its Gemma core documentation; real usage grows with context length (KV cache). Treat tables as planning hints, not guarantees.

Pick a Gemma 4 variant

Ollama exposes Gemma 4 under the gemma4 library name. Tags follow Google’s four sizes. Always confirm current tags on ollama.com/library/gemma4/tags.

Ollama tag (typical)	Role	Notes
`gemma4:e2b`	Lightest	Best for laptops, long battery, or tight RAM; 128K context class.
`gemma4:e4b`	Default balance	Often the practical “start here” tag for development; 128K context class.
`gemma4:26b`	MoE throughput	Stronger reasoning on capable GPUs; 256K context class. Heavier download and RAM.
`gemma4:31b`	Dense quality	Highest quality in the family for local use; 256K context class. Needs serious hardware.

Running ollama pull gemma4 without a tag usually pulls the library’s default tag (often aligned with E4B); specify a tag if you want a different size.

Step 1: Install Ollama

Open ollama.com/download and install the package for your OS.
In a terminal, verify the CLI:
```
ollama --version
```

Use a recent Ollama release. Gemma 4 landed after older builds; if pull fails with manifest or compatibility errors, update Ollama first.

Step 2: Download Gemma 4

Download the weights once (requires internet during this step):

ollama pull gemma4

Or pin a size explicitly, for example:

ollama pull gemma4:e2b
ollama pull gemma4:26b

Confirm the model is registered:

ollama list

Step 3: Test from the terminal

Quick sanity check:

ollama run gemma4 "Summarize local LLM setup in one sentence."

Vision (image input): Ollama supports image paths in the prompt for multimodal models. Google’s Ollama + Gemma guide shows the pattern: include the file path in the prompt (adjust for your OS path).

Raw HTTP check (optional):

curl http://localhost:11434/api/generate -d '{
  "model": "gemma4",
  "prompt": "Say hello in five words.",
  "stream": false
}'

Use Gemma 4 from a web app

Ollama listens on port 11434. Your application can:

Server-side: Node, Python, PHP, or edge functions call http://127.0.0.1:11434 with fetch or HTTP clients. This avoids browser CORS limits and is the pattern for production-style apps.
Client-side: JavaScript in the browser can call the same URL only if the page origin is allowed by Ollama’s CORS rules (see next section).

Example: non-streaming generate from JavaScript (suitable for a local dev page on an allowed origin):

const res = await fetch("http://127.0.0.1:11434/api/generate", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "gemma4",
    prompt: "Explain fetch() in one paragraph.",
    stream: false,
  }),
});
const data = await res.json();
console.log(data.response);

For chat-style APIs and streaming tokens, use Ollama’s /api/chat endpoint with stream: true and read the response body as an event stream; see Ollama API documentation.

CORS, localhost, and when to use a backend proxy

Browsers block cross-origin requests unless the server sends the right CORS headers. Ollama can be configured with OLLAMA_ORIGINS so specific web origins (your dev server URL, an internal dashboard, or a browser extension scheme) may call the API directly.

Patterns that usually work well:

Local full-stack app: Add a route such as POST /api/chat on your Next.js, Express, or FastAPI app. The browser talks to your origin; the server forwards the body to 127.0.0.1:11434. No CORS issue for the Ollama hop.
Browser-only experiments: Set OLLAMA_ORIGINS to include your dev origin (or follow Ollama docs for your OS) so preflight requests succeed. Avoid wildcard origins on shared machines.

If you deploy a public website, do not expose an open Ollama port to the internet without authentication and network controls. Run inference behind your API with rate limits and auth, or use managed cloud inference instead.

Alternatives: Hugging Face and Python

If you need full checkpoints, fine-tuning, or integration with PyTorch pipelines, download Gemma 4 from Hugging Face or Kaggle and follow Google’s notebooks:

Those paths are heavier to operate but give maximum control compared to Ollama’s pre-quantized GGUF workflow.

Troubleshooting

Pull fails or unknown model: Update Ollama, then retry ollama pull gemma4:<tag>. Confirm the tag exists on the official library page.
Out of memory or extreme slowness: Switch to e2b or e4b, close other GPU-heavy apps, or run on a machine with more unified memory / VRAM.
Browser errors mentioning CORS: Use a server-side proxy or configure OLLAMA_ORIGINS for your dev URL.
Need managed hosting: Google documents cloud deployment patterns (for example Cloud Run + Ollama + Gemma) for teams that do not want to run hardware on a desk.

What are You Looking For?

How to Install and Run Gemma 4 Locally for Web Apps

What this guide covers

Prerequisites and hardware

Pick a Gemma 4 variant

Step 1: Install Ollama

Step 2: Download Gemma 4

Step 3: Test from the terminal

Use Gemma 4 from a web app

CORS, localhost, and when to use a backend proxy

Alternatives: Hugging Face and Python

Troubleshooting

Gemma 4: Four Open Models for Local AI, Now Under Apache 2.0

Odinn Omnia: A 77 lb “Portable” AI Data Center From CES 2026

Leave a Comment Cancel

Read Next

How to Build an AI Agent with Langflow- Complete Guide & Review

7 AI Terms You Actually Need to Understand

Web Development Courses: Where to Start in 2026

How to Install and Run Gemma 4 Locally for Web Apps

What this guide covers

Prerequisites and hardware

Pick a Gemma 4 variant

Step 1: Install Ollama

Step 2: Download Gemma 4

Step 3: Test from the terminal

Use Gemma 4 from a web app

CORS, localhost, and when to use a backend proxy

Alternatives: Hugging Face and Python

Troubleshooting

Gemma 4: Four Open Models for Local AI, Now Under Apache 2.0

Odinn Omnia: A 77 lb “Portable” AI Data Center From CES 2026

Leave a Comment Cancel

Read Next

How to Build an AI Agent with Langflow- Complete Guide & Review

7 AI Terms You Actually Need to Understand

Web Development Courses: Where to Start in 2026

Subscribe to our Newsletter