Intermediate
Build a Local Model Library
Organize your AI models on fast NVMe storage. Set up Ollama to use external drives, manage multiple model versions, and keep everything portable.
What You'll Need
- Ollama installed and working (see our Ollama guide if not)
- An NVMe SSD (500GB–2TB) in a USB-C enclosure — or internal if you have room
- At least 16GB RAM to run the bigger models
- ~30 minutes for downloads (the large models are several GB each)
💡 Why external NVMe? AI models are huge (2–50GB each). Keeping them on your boot drive eats up space fast. A $50 NVMe in a $15 USB-C enclosure gives you a portable model library that moves between machines. NVMe over USB 3.2 reads at ~1GB/s — fast enough that model load times barely change.
1 Set Up Your Storage
If you're using an external NVMe drive, format it and mount it.
Mac:
# Plug in the NVMe enclosure via USB-C
# Open Disk Utility → select the drive → Erase
# Format: APFS | Name: "Models"
# It'll mount at /Volumes/Models
Linux:
# Find the drive
lsblk
# Format (replace sdX with your drive)
sudo mkfs.ext4 /dev/sdX
# Create mount point and mount
sudo mkdir -p /mnt/models
sudo mount /dev/sdX /mnt/models
# Make it permanent (add to fstab)
echo '/dev/sdX /mnt/models ext4 defaults,nofail 0 2' | sudo tee -a /etc/fstab
⚠️ Double-check the device name with
lsblk before formatting. Formatting the wrong drive erases all data on it.
2 Point Ollama to the Drive
By default, Ollama stores models in ~/.ollama/models. We want them on the external drive instead.
Mac / Linux:
# Stop Ollama first
pkill ollama
# Move existing models to the new drive
mv ~/.ollama/models /Volumes/Models/ollama-models
# (Linux: mv ~/.ollama/models /mnt/models/ollama-models)
# Create a symlink so Ollama thinks nothing changed
ln -s /Volumes/Models/ollama-models ~/.ollama/models
# (Linux: ln -s /mnt/models/ollama-models ~/.ollama/models)
# Start Ollama again
ollama serve &
Or use the environment variable (cleaner):
# Set Ollama's model directory
export OLLAMA_MODELS=/Volumes/Models/ollama-models
# Add to your shell profile so it persists
echo 'export OLLAMA_MODELS=/Volumes/Models/ollama-models' >> ~/.zshrc
# (bash: >> ~/.bashrc)
💡 Tip: The
OLLAMA_MODELS environment variable is the cleanest approach. No symlinks to break, and you can switch between drives just by changing the variable.
Verify it's working:
ollama list
# Should show your existing models loading from the new location
du -sh /Volumes/Models/ollama-models
# Check how much space your models are using
3 Build Your Model Collection
Here's a curated starter library organized by task. Pull what you need.
| Task | Model | Size | Why This One |
|---|---|---|---|
| General chat | llama3.2 |
2GB | Fast, smart, great default for everyday use |
| General (bigger) | llama3.1:8b |
4.7GB | Noticeably smarter, needs 8GB+ RAM |
| Coding | deepseek-coder-v2:16b |
8.9GB | Excellent at code gen, debugging, refactoring |
| Coding (light) | codellama:7b |
3.8GB | Lighter option for code on limited RAM |
| Vision | llava:7b |
4.5GB | Describe images, read screenshots, OCR |
| Writing | mistral |
4.1GB | Clean prose, good for drafting and editing |
| Reasoning | deepseek-r1:8b |
4.9GB | Chain-of-thought, math, complex logic |
| Embedding | nomic-embed-text |
274MB | Generate embeddings for RAG / search |
# Pull your starter library
ollama pull llama3.2
ollama pull mistral
ollama pull deepseek-coder-v2:16b
ollama pull llava:7b
ollama pull nomic-embed-text
# Check your collection
ollama list
💡 Tip: Start small. Pull
llama3.2 and mistral as your daily drivers, then add specialized models as you need them. There's no point downloading 50GB of models you won't use.
4 Create Custom Modelfiles
Modelfiles let you customize a model's behavior — system prompt, temperature, context length. Think of them as presets.
# Create a Modelfile for a coding assistant
cat > ~/Modelfile-coder << 'EOF'
FROM deepseek-coder-v2:16b
SYSTEM "You are a senior software engineer. Write clean, well-documented code. Prefer Python unless asked otherwise. Always explain your reasoning."
PARAMETER temperature 0.3
PARAMETER num_ctx 8192
EOF
# Build it
ollama create coder -f ~/Modelfile-coder
# Now use it like any model
ollama run coder "Write a Python script that watches a folder for new files"
More Modelfile ideas:
# A creative writer with high temperature
cat > ~/Modelfile-writer << 'EOF'
FROM mistral
SYSTEM "You are a creative writing assistant. Write vivid, engaging prose. Vary sentence length and structure."
PARAMETER temperature 0.9
PARAMETER num_ctx 4096
EOF
ollama create writer -f ~/Modelfile-writer
# A research assistant with low temperature
cat > ~/Modelfile-research << 'EOF'
FROM llama3.1:8b
SYSTEM "You are a research analyst. Be precise, cite your reasoning, flag uncertainty. Use bullet points for clarity."
PARAMETER temperature 0.1
PARAMETER num_ctx 8192
EOF
ollama create research -f ~/Modelfile-research
💡 Tip: Custom models are just pointers — they don't duplicate the base model's weights. So
coder, writer, and research all share the original model files. No extra disk space.
5 Manage and Maintain Your Library
Useful commands for keeping your model library organized:
# List all models with sizes
ollama list
# See detailed info about a model
ollama show llama3.2
# Remove a model you no longer need
ollama rm codellama:7b
# Update a model to the latest version
ollama pull llama3.2
# (re-pulling downloads only the diff)
# Check total disk usage
du -sh $OLLAMA_MODELS
# Copy your entire library to another machine
rsync -avh /Volumes/Models/ollama-models/ user@other-machine:/path/to/models/
Storage planning by library size:
| Library Tier | Models | Space Needed | Drive |
|---|---|---|---|
| Starter | 2-3 small models | ~10GB | Any SSD or boot drive |
| Working | 5-8 mixed models | ~30-50GB | 256GB+ NVMe |
| Full | 10-15 models + customs | ~100-200GB | 500GB+ NVMe |
| Lab | Everything + fine-tunes | ~500GB+ | 1-2TB NVMe |
⚠️ Don't forget to eject safely. If your models are on an external drive, always eject it properly before unplugging. Pulling the cable while Ollama is loading a model can corrupt the file and you'll need to re-download it.
✅ What You've Set Up
- Ollama running from fast external NVMe storage — portable between machines
- A curated model collection organized by task (chat, code, vision, writing, reasoning)
- Custom Modelfiles for different workflows — coding, writing, research
- Commands for maintaining, updating, and syncing your model library
Next Steps
- Set up an always-on AI server — dedicate a mini PC to run Ollama 24/7 so any device on your network can use it. Great for sharing your model library across machines.
- Try GGUF models from Hugging Face — Ollama can import any GGUF model:
ollama create mymodel -f Modelfilewith aFROM ./model.ggufline. - Fine-tune a model — train a model on your own data for specialized tasks. Check our fine-tuning guide when it's live.
- Add embeddings for RAG — use
nomic-embed-textto build a searchable knowledge base from your documents.
⚠️ Model quality varies by quantization. When you see
:7b or :13b that's the parameter count. But the Q4_K_M vs Q8 suffix matters too — higher quant = more accurate but bigger. Stick with Q4_K_M for the best size/quality balance.