Install Ollama & Run Your First LLM

Download Ollama, pull a model, and have a conversation with a local AI in under 5 minutes. No GPU required.

⏱ ~5 minutes 💻 Mac / Linux / Windows 🧠 No GPU needed

What You'll Need

A computer with at least 8GB RAM (16GB+ recommended for 7B+ models)
macOS, Linux, or Windows 10/11
A terminal app (Terminal on Mac, bash on Linux, PowerShell on Windows)
~5GB free disk space for your first model

💡 Tip: Ollama runs models on your CPU by default. If you have an Apple Silicon Mac (M1/M2/M3/M4) or an NVIDIA GPU, it will automatically use the GPU for much faster responses.

1 Install Ollama

Mac — Download from the official site or use Homebrew:

# Option A: Download the app
# Go to https://ollama.com/download and grab the Mac installer

# Option B: Homebrew
brew install ollama
        

Linux — One-line install script:

curl -fsSL https://ollama.com/install.sh | sh

Windows — Download from ollama.com/download and run the installer.

Verify the install:

ollama --version
# Should print something like: ollama version 0.5.x
        

2 Pull Your First Model

Ollama works like Docker for AI models — you pull a model once, and it stays on your machine.

# Pull Llama 3.2 3B — small, fast, great to start with
ollama pull llama3.2

# This downloads ~2GB. Wait for it to finish.
        

💡 Tip: The first pull takes a few minutes depending on your internet speed. After that, the model loads from disk in seconds.

3 Start Chatting

ollama run llama3.2

# You'll see a prompt like:
# >>> Send a message (/? for help)

# Try typing:
# >>> Explain quantum computing in one paragraph
        

That's it. You're running a language model locally on your own hardware. No API keys, no cloud, no data leaving your machine.

Press Ctrl+D or type /bye to exit the chat.

4 Try More Models

Ollama has a library of models. Here are the best ones to start with:

Model	Size	RAM Needed	Best For
`llama3.2`	2GB	8GB	Fast chat, Q&A, quick tasks
`llama3.1:8b`	4.7GB	16GB	Conversations, writing, coding
`mistral`	4.1GB	16GB	Balanced quality, good reasoning
`codellama`	3.8GB	16GB	Code generation and debugging
`phi3`	2.2GB	8GB	Small but surprisingly capable
`deepseek-coder-v2`	8.9GB	32GB	Serious code + math

# Pull and run any model
ollama pull mistral
ollama run mistral

# List all your downloaded models
ollama list

# Remove a model you don't need
ollama rm codellama
        

5 Use Ollama as an API

Ollama runs a local server on port 11434. You can hit it from any app, script, or tool:

# Chat via the API (works with curl, Python, JS, anything)
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{"role": "user", "content": "What is the capital of France?"}],
  "stream": false
}'
        

💡 Tip: The API is compatible with the OpenAI format. Many apps that work with ChatGPT can be pointed at http://localhost:11434 instead — instant local AI for your existing tools.

✅ What You've Set Up

Ollama installed and running on your machine
At least one local LLM downloaded and ready to chat
A local API server you can integrate with any tool
Zero cloud dependency — everything runs offline

Next Steps

Add voice input — pair Ollama with Whisper for speech-to-text (guide coming soon)
Build a model library — organize models on external NVMe storage (guide coming soon)
Make it always-on — set up Ollama to auto-start on boot and serve your LAN (guide coming soon)
Try AI OS — our project adds memory, identity, and learning loops on top of Ollama

⚠️ Note on hardware: 8GB RAM will run 3B models fine. For 7B+ models, you want 16GB minimum. For 13B+, aim for 32GB. Check our hardware store for recommended setups.

What You'll Need

1 Install Ollama

2 Pull Your First Model

3 Start Chatting

4 Try More Models

5 Use Ollama as an API

✅ What You've Set Up

Next Steps

📚 Learning Links

Videos

Official Docs

Community