Run AI Models on Your Own Hardware

Ollama makes it effortless to download, run, and manage large language models locally. Whether you're a developer building AI-powered applications, a researcher experimenting with model fine-tuning, or a privacy-conscious user who wants to keep conversations off the cloud — Ollama gives you full control over your AI stack.

With a single command, you can pull and run models like Llama 3.3, Mistral, Gemma 2, Phi-3, Code Llama, DeepSeek Coder, and dozens more. Ollama handles model weights, quantization, GPU acceleration, and memory management automatically.

Key Features

  • One-command model downloads: ollama pull llama3.3
  • GPU acceleration with automatic CUDA, ROCm, and Metal detection
  • OpenAI-compatible REST API on localhost:11434
  • Model library with 100+ pre-quantized models ready to run
  • Modelfile system for creating custom models with system prompts and parameters
  • Concurrent model loading and request handling
  • Runs fully offline — no internet required after initial download
  • Cross-platform: macOS, Linux, Windows, and Docker

Getting Started

After installing Ollama, open your terminal and run your first model:

# Pull and run Llama 3.3 (8B parameters)
ollama run llama3.3

# Pull a coding-focused model
ollama run codellama:13b

# List downloaded models
ollama list

# Serve the API
ollama serve

The built-in API is compatible with the OpenAI Chat Completions format, so you can point any OpenAI SDK client at http://localhost:11434 and it works out of the box.

System Requirements

Component Minimum Recommended
RAM 8 GB 16 GB+ for 13B models
Storage 4 GB per model SSD recommended
GPU Optional (CPU works) NVIDIA 8GB+ VRAM or Apple Silicon
OS macOS 12+, Linux, Windows 10+ Latest stable release

Popular Models

Model Parameters Best For
Llama 3.3 8B / 70B General chat, reasoning, instruction following
Mistral 7B Fast inference, multilingual, coding
Code Llama 7B / 13B / 34B Code generation, debugging, completion
Gemma 2 2B / 9B / 27B Lightweight tasks, edge deployment
DeepSeek Coder V2 16B / 236B Advanced code generation and analysis
Phi-3 3.8B / 14B Small footprint, mobile-friendly
Llava 7B / 13B Vision + language multimodal tasks

Build Custom Models

Create specialized models using Modelfiles — similar to Dockerfiles but for AI models:

FROM llama3.3

SYSTEM """
You are a senior software architect. You give concise, 
practical advice with code examples. You prefer modern 
patterns and always consider security implications.
"""

PARAMETER temperature 0.3
PARAMETER num_ctx 4096

Save this as Modelfile and run ollama create my-architect -f Modelfile to create your custom model.

Privacy First

Everything runs on your machine. Your prompts, conversations, and data never leave your hardware. There is no telemetry, no cloud dependency, and no API keys required. Once a model is downloaded, Ollama works completely offline.


Ollama is free, open-source (MIT licensed), and backed by an active community with over 100,000 GitHub stars. It has become the standard tool for running local LLMs in development, testing, and production self-hosted environments.