Ollama
by Ollama Inc
Run large language models locally. Download, create, and manage AI models like Llama 3, Mistral, Gemma, and more on your own hardware.
Run AI Models on Your Own Hardware
Ollama makes it effortless to download, run, and manage large language models locally. Whether you're a developer building AI-powered applications, a researcher experimenting with model fine-tuning, or a privacy-conscious user who wants to keep conversations off the cloud — Ollama gives you full control over your AI stack.
With a single command, you can pull and run models like Llama 3.3, Mistral, Gemma 2, Phi-3, Code Llama, DeepSeek Coder, and dozens more. Ollama handles model weights, quantization, GPU acceleration, and memory management automatically.
Key Features
- One-command model downloads:
ollama pull llama3.3 - GPU acceleration with automatic CUDA, ROCm, and Metal detection
- OpenAI-compatible REST API on localhost:11434
- Model library with 100+ pre-quantized models ready to run
- Modelfile system for creating custom models with system prompts and parameters
- Concurrent model loading and request handling
- Runs fully offline — no internet required after initial download
- Cross-platform: macOS, Linux, Windows, and Docker
Getting Started
After installing Ollama, open your terminal and run your first model:
# Pull and run Llama 3.3 (8B parameters)
ollama run llama3.3
# Pull a coding-focused model
ollama run codellama:13b
# List downloaded models
ollama list
# Serve the API
ollama serve
The built-in API is compatible with the OpenAI Chat Completions format, so you can point any OpenAI SDK client at http://localhost:11434 and it works out of the box.
System Requirements
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16 GB+ for 13B models |
| Storage | 4 GB per model | SSD recommended |
| GPU | Optional (CPU works) | NVIDIA 8GB+ VRAM or Apple Silicon |
| OS | macOS 12+, Linux, Windows 10+ | Latest stable release |
Popular Models
| Model | Parameters | Best For |
|---|---|---|
| Llama 3.3 | 8B / 70B | General chat, reasoning, instruction following |
| Mistral | 7B | Fast inference, multilingual, coding |
| Code Llama | 7B / 13B / 34B | Code generation, debugging, completion |
| Gemma 2 | 2B / 9B / 27B | Lightweight tasks, edge deployment |
| DeepSeek Coder V2 | 16B / 236B | Advanced code generation and analysis |
| Phi-3 | 3.8B / 14B | Small footprint, mobile-friendly |
| Llava | 7B / 13B | Vision + language multimodal tasks |
Build Custom Models
Create specialized models using Modelfiles — similar to Dockerfiles but for AI models:
FROM llama3.3
SYSTEM """
You are a senior software architect. You give concise,
practical advice with code examples. You prefer modern
patterns and always consider security implications.
"""
PARAMETER temperature 0.3
PARAMETER num_ctx 4096
Save this as Modelfile and run ollama create my-architect -f Modelfile to create your custom model.
Privacy First
Everything runs on your machine. Your prompts, conversations, and data never leave your hardware. There is no telemetry, no cloud dependency, and no API keys required. Once a model is downloaded, Ollama works completely offline.
Ollama is free, open-source (MIT licensed), and backed by an active community with over 100,000 GitHub stars. It has become the standard tool for running local LLMs in development, testing, and production self-hosted environments.
Running Llama 3 locally with GPU acceleration is incredible. Privacy-first AI without subscriptions.
Ollama made running local LLMs trivial. The OpenAI-compatible API means all my existing tools just work.