English

Ollama Local Models

Ollama lets you run AI models on your own computer—fully offline, fully private.

What is Ollama

Ollama is an open-source tool for running large language models locally.

Benefits:

Completely offline, no internet needed
Data never leaves your computer
Free to use, no API key needed
Supports many open-source models

Limitations:

Requires decent hardware
Models generally less capable than cloud ones
First use requires downloading models

Installing Ollama

macOS

# Using Homebrew
brew install ollama

# Or download installer from
# https://ollama.ai/download

Windows

Go to ollama.ai/download
Download Windows installer
Run installation

Verify Installation

ollama --version

After installation, Ollama runs as a background service.

Downloading Models

From Terminal

# Download Llama 3
ollama pull llama3

# Download Mistral
ollama pull mistral

# Download Qwen (good for Chinese)
ollama pull qwen2

From MoryFlow

Open Settings → Model Management
Find the Ollama section
Browse available models
Click download

Ollama Model Download

Recommended Models

General Use

Model	Size	Features	Command
Llama 3 8B	~4GB	Well-balanced	`ollama pull llama3`
Mistral 7B	~4GB	Strong reasoning	`ollama pull mistral`
Qwen2 7B	~4GB	Good Chinese	`ollama pull qwen2`

Lower-end Hardware

Model	Size	Features	Command
Phi-3 Mini	~2GB	Microsoft, lightweight	`ollama pull phi3`
Gemma 2B	~1.5GB	Google	`ollama pull gemma:2b`

Powerful Hardware

Model	Size	Features	Command
Llama 3 70B	~40GB	Near GPT-4 level	`ollama pull llama3:70b`
Mixtral 8x7B	~26GB	Mixture of experts	`ollama pull mixtral`

Configuring in MoryFlow

1. Configure Connection

Open Settings → Model Management
Find Ollama section
Confirm endpoint address (default http://localhost:11434)

Ollama Configuration

2. Select Model

Once configured, downloaded Ollama models appear in the model selector on the chat panel.

3. Start Using

Select an Ollama model and start chatting. All processing happens locally.

Hardware Requirements

Minimum

8GB RAM
Use 7B or smaller models

High Performance

32GB+ RAM
Apple Silicon (M1/M2/M3) or NVIDIA GPU
Can run 70B models

Performance Tips

macOS (Apple Silicon)

Apple Silicon works great with Ollama—highly recommended.

Windows (NVIDIA GPU)

Install the latest NVIDIA drivers and Ollama will automatically use GPU acceleration.

Not Enough RAM

Try:

Switch to a smaller model
Close other programs
Set OLLAMA_NUM_PARALLEL=1 to limit concurrency

Common Issues

Ollama Service Not Running

# Start service
ollama serve

Model Download Failed

Check network connection
Try using a proxy
Manually download model files

Response Is Slow

Switch to smaller model
Check if GPU acceleration is working
Close resource-heavy programs

Chinese Characters Display Wrong

Use Qwen2 model for better Chinese support.

Local vs Cloud

Aspect	Ollama Local	Cloud Models
Privacy	Fully local	Data uploaded
Cost	Free	Pay per use
Speed	Depends on hardware	Generally faster
Capability	Medium	Stronger
Offline	Yes	Needs internet

Recommendation: Use Ollama for privacy-sensitive content, cloud models for complex tasks.

Ollama Local Models#

What is Ollama#

Installing Ollama#

macOS#

Windows#

Verify Installation#

Downloading Models#

From Terminal#

From MoryFlow#

Recommended Models#

General Use#

Lower-end Hardware#

Powerful Hardware#

Configuring in MoryFlow#

1. Configure Connection#

2. Select Model#

3. Start Using#

Hardware Requirements#

Minimum#

Recommended#

High Performance#

Performance Tips#

macOS (Apple Silicon)#

Windows (NVIDIA GPU)#

Not Enough RAM#

Common Issues#

Ollama Service Not Running#

Model Download Failed#

Response Is Slow#

Chinese Characters Display Wrong#

Local vs Cloud#