← Back to all guides
🤖
Beginner

Install Ollama & Run Your First LLM

Download Ollama, pull a model, and have a conversation with a local AI in under 5 minutes. No GPU required.

⏱ ~5 minutes 💻 Mac / Linux / Windows 🧠 No GPU needed

What You'll Need

  • A computer with at least 8GB RAM (16GB+ recommended for 7B+ models)
  • macOS, Linux, or Windows 10/11
  • A terminal app (Terminal on Mac, bash on Linux, PowerShell on Windows)
  • ~5GB free disk space for your first model
💡 Tip: Ollama runs models on your CPU by default. If you have an Apple Silicon Mac (M1/M2/M3/M4) or an NVIDIA GPU, it will automatically use the GPU for much faster responses.

1 Install Ollama

Mac — Download from the official site or use Homebrew:

# Option A: Download the app # Go to https://ollama.com/download and grab the Mac installer # Option B: Homebrew brew install ollama

Linux — One-line install script:

curl -fsSL https://ollama.com/install.sh | sh

Windows — Download from ollama.com/download and run the installer.

Verify the install:

ollama --version # Should print something like: ollama version 0.5.x

2 Pull Your First Model

Ollama works like Docker for AI models — you pull a model once, and it stays on your machine.

# Pull Llama 3.2 3B — small, fast, great to start with ollama pull llama3.2 # This downloads ~2GB. Wait for it to finish.
💡 Tip: The first pull takes a few minutes depending on your internet speed. After that, the model loads from disk in seconds.

3 Start Chatting

ollama run llama3.2 # You'll see a prompt like: # >>> Send a message (/? for help) # Try typing: # >>> Explain quantum computing in one paragraph

That's it. You're running a language model locally on your own hardware. No API keys, no cloud, no data leaving your machine.

Press Ctrl+D or type /bye to exit the chat.

4 Try More Models

Ollama has a library of models. Here are the best ones to start with:

Model Size RAM Needed Best For
llama3.2 2GB 8GB Fast chat, Q&A, quick tasks
llama3.1:8b 4.7GB 16GB Conversations, writing, coding
mistral 4.1GB 16GB Balanced quality, good reasoning
codellama 3.8GB 16GB Code generation and debugging
phi3 2.2GB 8GB Small but surprisingly capable
deepseek-coder-v2 8.9GB 32GB Serious code + math
# Pull and run any model ollama pull mistral ollama run mistral # List all your downloaded models ollama list # Remove a model you don't need ollama rm codellama

5 Use Ollama as an API

Ollama runs a local server on port 11434. You can hit it from any app, script, or tool:

# Chat via the API (works with curl, Python, JS, anything) curl http://localhost:11434/api/chat -d '{ "model": "llama3.2", "messages": [{"role": "user", "content": "What is the capital of France?"}], "stream": false }'
💡 Tip: The API is compatible with the OpenAI format. Many apps that work with ChatGPT can be pointed at http://localhost:11434 instead — instant local AI for your existing tools.

✅ What You've Set Up

  • Ollama installed and running on your machine
  • At least one local LLM downloaded and ready to chat
  • A local API server you can integrate with any tool
  • Zero cloud dependency — everything runs offline

Next Steps

  • Add voice input — pair Ollama with Whisper for speech-to-text (guide coming soon)
  • Build a model library — organize models on external NVMe storage (guide coming soon)
  • Make it always-on — set up Ollama to auto-start on boot and serve your LAN (guide coming soon)
  • Try AI OSour project adds memory, identity, and learning loops on top of Ollama
⚠️ Note on hardware: 8GB RAM will run 3B models fine. For 7B+ models, you want 16GB minimum. For 13B+, aim for 32GB. Check our hardware store for recommended setups.

📚 Learning Links

Videos

Official Docs

Community