← Back to all guides
💾
Intermediate

Build a Local Model Library

Organize your AI models on fast NVMe storage. Set up Ollama to use external drives, manage multiple model versions, and keep everything portable.

⏱ ~15 minutes 💻 Mac / Linux / Windows 💾 500GB+ NVMe recommended

What You'll Need

  • Ollama installed and working (see our Ollama guide if not)
  • An NVMe SSD (500GB–2TB) in a USB-C enclosure — or internal if you have room
  • At least 16GB RAM to run the bigger models
  • ~30 minutes for downloads (the large models are several GB each)
💡 Why external NVMe? AI models are huge (2–50GB each). Keeping them on your boot drive eats up space fast. A $50 NVMe in a $15 USB-C enclosure gives you a portable model library that moves between machines. NVMe over USB 3.2 reads at ~1GB/s — fast enough that model load times barely change.

1 Set Up Your Storage

If you're using an external NVMe drive, format it and mount it.

Mac:

# Plug in the NVMe enclosure via USB-C # Open Disk Utility → select the drive → Erase # Format: APFS | Name: "Models" # It'll mount at /Volumes/Models

Linux:

# Find the drive lsblk # Format (replace sdX with your drive) sudo mkfs.ext4 /dev/sdX # Create mount point and mount sudo mkdir -p /mnt/models sudo mount /dev/sdX /mnt/models # Make it permanent (add to fstab) echo '/dev/sdX /mnt/models ext4 defaults,nofail 0 2' | sudo tee -a /etc/fstab
⚠️ Double-check the device name with lsblk before formatting. Formatting the wrong drive erases all data on it.

2 Point Ollama to the Drive

By default, Ollama stores models in ~/.ollama/models. We want them on the external drive instead.

Mac / Linux:

# Stop Ollama first pkill ollama # Move existing models to the new drive mv ~/.ollama/models /Volumes/Models/ollama-models # (Linux: mv ~/.ollama/models /mnt/models/ollama-models) # Create a symlink so Ollama thinks nothing changed ln -s /Volumes/Models/ollama-models ~/.ollama/models # (Linux: ln -s /mnt/models/ollama-models ~/.ollama/models) # Start Ollama again ollama serve &

Or use the environment variable (cleaner):

# Set Ollama's model directory export OLLAMA_MODELS=/Volumes/Models/ollama-models # Add to your shell profile so it persists echo 'export OLLAMA_MODELS=/Volumes/Models/ollama-models' >> ~/.zshrc # (bash: >> ~/.bashrc)
💡 Tip: The OLLAMA_MODELS environment variable is the cleanest approach. No symlinks to break, and you can switch between drives just by changing the variable.

Verify it's working:

ollama list # Should show your existing models loading from the new location du -sh /Volumes/Models/ollama-models # Check how much space your models are using

3 Build Your Model Collection

Here's a curated starter library organized by task. Pull what you need.

Task Model Size Why This One
General chat llama3.2 2GB Fast, smart, great default for everyday use
General (bigger) llama3.1:8b 4.7GB Noticeably smarter, needs 8GB+ RAM
Coding deepseek-coder-v2:16b 8.9GB Excellent at code gen, debugging, refactoring
Coding (light) codellama:7b 3.8GB Lighter option for code on limited RAM
Vision llava:7b 4.5GB Describe images, read screenshots, OCR
Writing mistral 4.1GB Clean prose, good for drafting and editing
Reasoning deepseek-r1:8b 4.9GB Chain-of-thought, math, complex logic
Embedding nomic-embed-text 274MB Generate embeddings for RAG / search
# Pull your starter library ollama pull llama3.2 ollama pull mistral ollama pull deepseek-coder-v2:16b ollama pull llava:7b ollama pull nomic-embed-text # Check your collection ollama list
💡 Tip: Start small. Pull llama3.2 and mistral as your daily drivers, then add specialized models as you need them. There's no point downloading 50GB of models you won't use.

4 Create Custom Modelfiles

Modelfiles let you customize a model's behavior — system prompt, temperature, context length. Think of them as presets.

# Create a Modelfile for a coding assistant cat > ~/Modelfile-coder << 'EOF' FROM deepseek-coder-v2:16b SYSTEM "You are a senior software engineer. Write clean, well-documented code. Prefer Python unless asked otherwise. Always explain your reasoning." PARAMETER temperature 0.3 PARAMETER num_ctx 8192 EOF # Build it ollama create coder -f ~/Modelfile-coder # Now use it like any model ollama run coder "Write a Python script that watches a folder for new files"

More Modelfile ideas:

# A creative writer with high temperature cat > ~/Modelfile-writer << 'EOF' FROM mistral SYSTEM "You are a creative writing assistant. Write vivid, engaging prose. Vary sentence length and structure." PARAMETER temperature 0.9 PARAMETER num_ctx 4096 EOF ollama create writer -f ~/Modelfile-writer # A research assistant with low temperature cat > ~/Modelfile-research << 'EOF' FROM llama3.1:8b SYSTEM "You are a research analyst. Be precise, cite your reasoning, flag uncertainty. Use bullet points for clarity." PARAMETER temperature 0.1 PARAMETER num_ctx 8192 EOF ollama create research -f ~/Modelfile-research
💡 Tip: Custom models are just pointers — they don't duplicate the base model's weights. So coder, writer, and research all share the original model files. No extra disk space.

5 Manage and Maintain Your Library

Useful commands for keeping your model library organized:

# List all models with sizes ollama list # See detailed info about a model ollama show llama3.2 # Remove a model you no longer need ollama rm codellama:7b # Update a model to the latest version ollama pull llama3.2 # (re-pulling downloads only the diff) # Check total disk usage du -sh $OLLAMA_MODELS # Copy your entire library to another machine rsync -avh /Volumes/Models/ollama-models/ user@other-machine:/path/to/models/

Storage planning by library size:

Library Tier Models Space Needed Drive
Starter 2-3 small models ~10GB Any SSD or boot drive
Working 5-8 mixed models ~30-50GB 256GB+ NVMe
Full 10-15 models + customs ~100-200GB 500GB+ NVMe
Lab Everything + fine-tunes ~500GB+ 1-2TB NVMe
⚠️ Don't forget to eject safely. If your models are on an external drive, always eject it properly before unplugging. Pulling the cable while Ollama is loading a model can corrupt the file and you'll need to re-download it.

✅ What You've Set Up

  • Ollama running from fast external NVMe storage — portable between machines
  • A curated model collection organized by task (chat, code, vision, writing, reasoning)
  • Custom Modelfiles for different workflows — coding, writing, research
  • Commands for maintaining, updating, and syncing your model library

Next Steps

  • Set up an always-on AI server — dedicate a mini PC to run Ollama 24/7 so any device on your network can use it. Great for sharing your model library across machines.
  • Try GGUF models from Hugging Face — Ollama can import any GGUF model: ollama create mymodel -f Modelfile with a FROM ./model.gguf line.
  • Fine-tune a model — train a model on your own data for specialized tasks. Check our fine-tuning guide when it's live.
  • Add embeddings for RAG — use nomic-embed-text to build a searchable knowledge base from your documents.
⚠️ Model quality varies by quantization. When you see :7b or :13b that's the parameter count. But the Q4_K_M vs Q8 suffix matters too — higher quant = more accurate but bigger. Stick with Q4_K_M for the best size/quality balance.

📚 Learning Links

Videos

Official Docs

Community