Run AI Models on Your Own Hardware

Ollama makes it effortless to download, run, and manage large language models locally. Whether you're a developer building AI-powered applications, a researcher experimenting with model fine-tuning, or a privacy-conscious user who wants to keep conversations off the cloud — Ollama gives you full control over your AI stack.

With a single command, you can pull and run models like Llama 3.3, Mistral, Gemma 2, Phi-3, Code Llama, DeepSeek Coder, and dozens more. Ollama handles model weights, quantization, GPU acceleration, and memory management automatically.

Key Features

One-command model downloads: ollama pull llama3.3
GPU acceleration with automatic CUDA, ROCm, and Metal detection
OpenAI-compatible REST API on localhost:11434
Model library with 100+ pre-quantized models ready to run
Modelfile system for creating custom models with system prompts and parameters
Concurrent model loading and request handling
Runs fully offline — no internet required after initial download
Cross-platform: macOS, Linux, Windows, and Docker

Getting Started

After installing Ollama, open your terminal and run your first model:

# Pull and run Llama 3.3 (8B parameters)
ollama run llama3.3

# Pull a coding-focused model
ollama run codellama:13b

# List downloaded models
ollama list

# Serve the API
ollama serve

The built-in API is compatible with the OpenAI Chat Completions format, so you can point any OpenAI SDK client at http://localhost:11434 and it works out of the box.

System Requirements

Component	Minimum	Recommended
RAM	8 GB	16 GB+ for 13B models
Storage	4 GB per model	SSD recommended
GPU	Optional (CPU works)	NVIDIA 8GB+ VRAM or Apple Silicon
OS	macOS 12+, Linux, Windows 10+	Latest stable release

Popular Models

Model	Parameters	Best For
Llama 3.3	8B / 70B	General chat, reasoning, instruction following
Mistral	7B	Fast inference, multilingual, coding
Code Llama	7B / 13B / 34B	Code generation, debugging, completion
Gemma 2	2B / 9B / 27B	Lightweight tasks, edge deployment
DeepSeek Coder V2	16B / 236B	Advanced code generation and analysis
Phi-3	3.8B / 14B	Small footprint, mobile-friendly
Llava	7B / 13B	Vision + language multimodal tasks

Build Custom Models

Create specialized models using Modelfiles — similar to Dockerfiles but for AI models:

FROM llama3.3

SYSTEM """
You are a senior software architect. You give concise, 
practical advice with code examples. You prefer modern 
patterns and always consider security implications.
"""

PARAMETER temperature 0.3
PARAMETER num_ctx 4096

Save this as Modelfile and run ollama create my-architect -f Modelfile to create your custom model.

Privacy First

Everything runs on your machine. Your prompts, conversations, and data never leave your hardware. There is no telemetry, no cloud dependency, and no API keys required. Once a model is downloaded, Ollama works completely offline.

Ollama is free, open-source (MIT licensed), and backed by an active community with over 100,000 GitHub stars. It has become the standard tool for running local LLMs in development, testing, and production self-hosted environments.

Ollama