Ollama Local Models
Ollama lets you run AI models on your own computer—fully offline, fully private.
What is Ollama
Ollama is an open-source tool for running large language models locally.
Benefits:
- Completely offline, no internet needed
- Data never leaves your computer
- Free to use, no API key needed
- Supports many open-source models
Limitations:
- Requires decent hardware
- Models generally less capable than cloud ones
- First use requires downloading models
Installing Ollama
macOS
# Using Homebrew
brew install ollama
# Or download installer from
# https://ollama.ai/download
Windows
- Go to ollama.ai/download
- Download Windows installer
- Run installation
Verify Installation
After installation, Ollama runs as a background service.
Downloading Models
From Terminal
# Download Llama 3
ollama pull llama3
# Download Mistral
ollama pull mistral
# Download Qwen (good for Chinese)
ollama pull qwen2
From MoryFlow
- Open Settings → Model Management
- Find the Ollama section
- Browse available models
- Click download

Recommended Models
General Use
| Model |
Size |
Features |
Command |
| Llama 3 8B |
~4GB |
Well-balanced |
ollama pull llama3 |
| Mistral 7B |
~4GB |
Strong reasoning |
ollama pull mistral |
| Qwen2 7B |
~4GB |
Good Chinese |
ollama pull qwen2 |
Lower-end Hardware
| Model |
Size |
Features |
Command |
| Phi-3 Mini |
~2GB |
Microsoft, lightweight |
ollama pull phi3 |
| Gemma 2B |
~1.5GB |
Google |
ollama pull gemma:2b |
Powerful Hardware
| Model |
Size |
Features |
Command |
| Llama 3 70B |
~40GB |
Near GPT-4 level |
ollama pull llama3:70b |
| Mixtral 8x7B |
~26GB |
Mixture of experts |
ollama pull mixtral |
Configuring in MoryFlow
1. Configure Connection
- Open Settings → Model Management
- Find Ollama section
- Confirm endpoint address (default
http://localhost:11434)

2. Select Model
Once configured, downloaded Ollama models appear in the model selector on the chat panel.
3. Start Using
Select an Ollama model and start chatting. All processing happens locally.
Hardware Requirements
Minimum
- 8GB RAM
- Use 7B or smaller models
Recommended
- 16GB RAM
- Use 7B-13B models
High Performance
- 32GB+ RAM
- Apple Silicon (M1/M2/M3) or NVIDIA GPU
- Can run 70B models
Performance Tips
macOS (Apple Silicon)
Apple Silicon works great with Ollama—highly recommended.
Windows (NVIDIA GPU)
Install the latest NVIDIA drivers and Ollama will automatically use GPU acceleration.
Not Enough RAM
Try:
- Switch to a smaller model
- Close other programs
- Set
OLLAMA_NUM_PARALLEL=1 to limit concurrency
Common Issues
Ollama Service Not Running
# Start service
ollama serve
Model Download Failed
- Check network connection
- Try using a proxy
- Manually download model files
Response Is Slow
- Switch to smaller model
- Check if GPU acceleration is working
- Close resource-heavy programs
Chinese Characters Display Wrong
Use Qwen2 model for better Chinese support.
Local vs Cloud
| Aspect |
Ollama Local |
Cloud Models |
| Privacy |
Fully local |
Data uploaded |
| Cost |
Free |
Pay per use |
| Speed |
Depends on hardware |
Generally faster |
| Capability |
Medium |
Stronger |
| Offline |
Yes |
Needs internet |
Recommendation: Use Ollama for privacy-sensitive content, cloud models for complex tasks.