Ollama Evals
Run open source models locally with Ollama and compare them against cloud models in Evvl. Find out if a local model is good enough for your task, or if you need to pay for a cloud API.
Why Evaluate Local Models?
Local models through Ollama are free to run, fully private, and work offline. The tradeoff is that they're generally less capable than the latest cloud models. The question is: how much less capable, and does it matter for your specific task?
For classification, extraction, summarization, and other structured tasks, a local 8B or 70B parameter model might be all you need. For complex reasoning or creative writing, you might need a cloud model. Testing with your actual prompts is the only way to know.
Common Ollama Model Comparisons
Llama 3 vs GPT-4o-mini
Meta's Llama 3 is one of the strongest open source models. Compare it against GPT-4o-mini (one of the cheapest cloud options) to see if the free local model can match paid performance.
Good for: Deciding if you need a cloud API at all
Model Size Comparison: 8B vs 70B
Smaller models run faster and use less RAM, but larger models are more capable. Test the same prompt across different sizes of the same model to find the smallest size that still gets good results.
Good for: Optimizing for your hardware constraints
Coding Models: DeepSeek vs Cloud
DeepSeek Coder and other specialized models can be surprisingly competitive with cloud models on coding tasks. Compare them on your actual code generation and review prompts.
Good for: Development teams evaluating local coding assistants
Local Models for Sensitive Data
When data can't leave your network, local models are your only option. Compare multiple Ollama models to find the best performer for tasks involving confidential information.
Good for: Healthcare, legal, finance, and other regulated industries
How to Evaluate Ollama Models with Evvl
- 1Install Ollama and pull models
Get one from ollama.com. Then run ollama pull llama3 to download a model.
- 2Open Evvl Desktop
Evvl automatically detects running Ollama instances. No API key needed. (Ollama requires the desktop app due to browser CORS restrictions.)
- 3Pick local and cloud models to compare
Select your Ollama models alongside cloud models from OpenAI, Anthropic, or Google for a direct comparison.
- 4Compare results side by side
See local and cloud model responses at once. Decide if the local model is good enough or if you need to pay for a cloud API.
Frequently Asked Questions
Can I use Ollama with Evvl's web app?
No. Ollama runs a local API server that browsers can't connect to due to CORS restrictions. You'll need the Evvl desktop app to use Ollama models. The desktop app connects to Ollama directly with no configuration needed.
Which Ollama model should I start with?
Llama 3 8B is a good starting point. It runs well on most hardware and performs surprisingly well on many tasks. If you have a GPU with 32GB+ VRAM, try Llama 3 70B for better quality. For coding tasks, try DeepSeek Coder.
Can I compare Ollama models with cloud models?
Yes, that's one of Evvl's main strengths. In the desktop app, you can compare Ollama models against GPT, Claude, Gemini, and any other supported provider in the same evaluation.
Compare local vs cloud models
See how Llama, Mistral, and Qwen stack up against GPT-4.1 and Claude on your tasks.
Get Started