💾

Intermediate

Build a Local Model Library

Organize your AI models on fast NVMe storage. Set up Ollama to use external drives, manage multiple model versions, and keep everything portable.

⏱ ~15 minutes 💻 Mac / Linux / Windows 💾 500GB+ NVMe recommended

What You'll Need

Ollama installed and working (see our Ollama guide if not)
An NVMe SSD (500GB–2TB) in a USB-C enclosure — or internal if you have room
At least 16GB RAM to run the bigger models
~30 minutes for downloads (the large models are several GB each)

💡 Why external NVMe? AI models are huge (2–50GB each). Keeping them on your boot drive eats up space fast. A $50 NVMe in a $15 USB-C enclosure gives you a portable model library that moves between machines. NVMe over USB 3.2 reads at ~1GB/s — fast enough that model load times barely change.

1 Set Up Your Storage

If you're using an external NVMe drive, format it and mount it.

Mac:

# Plug in the NVMe enclosure via USB-C
# Open Disk Utility → select the drive → Erase
# Format: APFS | Name: "Models"
# It'll mount at /Volumes/Models
        

Linux:

# Find the drive
lsblk

# Format (replace sdX with your drive)
sudo mkfs.ext4 /dev/sdX

# Create mount point and mount
sudo mkdir -p /mnt/models
sudo mount /dev/sdX /mnt/models

# Make it permanent (add to fstab)
echo '/dev/sdX /mnt/models ext4 defaults,nofail 0 2' | sudo tee -a /etc/fstab
        

⚠️ Double-check the device name with lsblk before formatting. Formatting the wrong drive erases all data on it.

2 Point Ollama to the Drive

By default, Ollama stores models in ~/.ollama/models. We want them on the external drive instead.

Mac / Linux:

# Stop Ollama first
pkill ollama

# Move existing models to the new drive
mv ~/.ollama/models /Volumes/Models/ollama-models
# (Linux: mv ~/.ollama/models /mnt/models/ollama-models)

# Create a symlink so Ollama thinks nothing changed
ln -s /Volumes/Models/ollama-models ~/.ollama/models
# (Linux: ln -s /mnt/models/ollama-models ~/.ollama/models)

# Start Ollama again
ollama serve &
        

Or use the environment variable (cleaner):

# Set Ollama's model directory
export OLLAMA_MODELS=/Volumes/Models/ollama-models

# Add to your shell profile so it persists
echo 'export OLLAMA_MODELS=/Volumes/Models/ollama-models' >> ~/.zshrc
# (bash: >> ~/.bashrc)
        

💡 Tip: The OLLAMA_MODELS environment variable is the cleanest approach. No symlinks to break, and you can switch between drives just by changing the variable.

Verify it's working:

ollama list
# Should show your existing models loading from the new location

du -sh /Volumes/Models/ollama-models
# Check how much space your models are using
        

3 Build Your Model Collection

Here's a curated starter library organized by task. Pull what you need.

Task	Model	Size	Why This One
General chat	`llama3.2`	2GB	Fast, smart, great default for everyday use
General (bigger)	`llama3.1:8b`	4.7GB	Noticeably smarter, needs 8GB+ RAM
Coding	`deepseek-coder-v2:16b`	8.9GB	Excellent at code gen, debugging, refactoring
Coding (light)	`codellama:7b`	3.8GB	Lighter option for code on limited RAM
Vision	`llava:7b`	4.5GB	Describe images, read screenshots, OCR
Writing	`mistral`	4.1GB	Clean prose, good for drafting and editing
Reasoning	`deepseek-r1:8b`	4.9GB	Chain-of-thought, math, complex logic
Embedding	`nomic-embed-text`	274MB	Generate embeddings for RAG / search

# Pull your starter library
ollama pull llama3.2
ollama pull mistral
ollama pull deepseek-coder-v2:16b
ollama pull llava:7b
ollama pull nomic-embed-text

# Check your collection
ollama list
        

💡 Tip: Start small. Pull llama3.2 and mistral as your daily drivers, then add specialized models as you need them. There's no point downloading 50GB of models you won't use.

4 Create Custom Modelfiles

Modelfiles let you customize a model's behavior — system prompt, temperature, context length. Think of them as presets.

# Create a Modelfile for a coding assistant
cat > ~/Modelfile-coder << 'EOF'
FROM deepseek-coder-v2:16b
SYSTEM "You are a senior software engineer. Write clean, well-documented code. Prefer Python unless asked otherwise. Always explain your reasoning."
PARAMETER temperature 0.3
PARAMETER num_ctx 8192
EOF

# Build it
ollama create coder -f ~/Modelfile-coder

# Now use it like any model
ollama run coder "Write a Python script that watches a folder for new files"
        

More Modelfile ideas:

# A creative writer with high temperature
cat > ~/Modelfile-writer << 'EOF'
FROM mistral
SYSTEM "You are a creative writing assistant. Write vivid, engaging prose. Vary sentence length and structure."
PARAMETER temperature 0.9
PARAMETER num_ctx 4096
EOF
ollama create writer -f ~/Modelfile-writer

# A research assistant with low temperature
cat > ~/Modelfile-research << 'EOF'
FROM llama3.1:8b
SYSTEM "You are a research analyst. Be precise, cite your reasoning, flag uncertainty. Use bullet points for clarity."
PARAMETER temperature 0.1
PARAMETER num_ctx 8192
EOF
ollama create research -f ~/Modelfile-research
        

💡 Tip: Custom models are just pointers — they don't duplicate the base model's weights. So coder, writer, and research all share the original model files. No extra disk space.

5 Manage and Maintain Your Library

Useful commands for keeping your model library organized:

# List all models with sizes
ollama list

# See detailed info about a model
ollama show llama3.2

# Remove a model you no longer need
ollama rm codellama:7b

# Update a model to the latest version
ollama pull llama3.2
# (re-pulling downloads only the diff)

# Check total disk usage
du -sh $OLLAMA_MODELS

# Copy your entire library to another machine
rsync -avh /Volumes/Models/ollama-models/ user@other-machine:/path/to/models/
        

Storage planning by library size:

Library Tier	Models	Space Needed	Drive
Starter	2-3 small models	~10GB	Any SSD or boot drive
Working	5-8 mixed models	~30-50GB	256GB+ NVMe
Full	10-15 models + customs	~100-200GB	500GB+ NVMe
Lab	Everything + fine-tunes	~500GB+	1-2TB NVMe

⚠️ Don't forget to eject safely. If your models are on an external drive, always eject it properly before unplugging. Pulling the cable while Ollama is loading a model can corrupt the file and you'll need to re-download it.

✅ What You've Set Up

Ollama running from fast external NVMe storage — portable between machines
A curated model collection organized by task (chat, code, vision, writing, reasoning)
Custom Modelfiles for different workflows — coding, writing, research
Commands for maintaining, updating, and syncing your model library

Next Steps

Set up an always-on AI server — dedicate a mini PC to run Ollama 24/7 so any device on your network can use it. Great for sharing your model library across machines.
Try GGUF models from Hugging Face — Ollama can import any GGUF model: ollama create mymodel -f Modelfile with a FROM ./model.gguf line.
Fine-tune a model — train a model on your own data for specialized tasks. Check our fine-tuning guide when it's live.
Add embeddings for RAG — use nomic-embed-text to build a searchable knowledge base from your documents.

⚠️ Model quality varies by quantization. When you see :7b or :13b that's the parameter count. But the Q4_K_M vs Q8 suffix matters too — higher quant = more accurate but bigger. Stick with Q4_K_M for the best size/quality balance.

What You'll Need

1 Set Up Your Storage

2 Point Ollama to the Drive

3 Build Your Model Collection

4 Create Custom Modelfiles

5 Manage and Maintain Your Library

✅ What You've Set Up

Next Steps

📚 Learning Links

Videos

Official Docs

Community