Back to Home

Ollama Evals

Run open source models locally with Ollama and compare them against cloud models in Evvl. Find out if a local model is good enough for your task, or if you need to pay for a cloud API.

Why Evaluate Local Models?

Local models through Ollama are free to run, fully private, and work offline. The tradeoff is that they're generally less capable than the latest cloud models. The question is: how much less capable, and does it matter for your specific task?

For classification, extraction, summarization, and other structured tasks, a local 8B or 70B parameter model might be all you need. For complex reasoning or creative writing, you might need a cloud model. Testing with your actual prompts is the only way to know.

Common Ollama Model Comparisons

Llama 3 vs GPT-4o-mini

Meta's Llama 3 is one of the strongest open source models. Compare it against GPT-4o-mini (one of the cheapest cloud options) to see if the free local model can match paid performance.

Good for: Deciding if you need a cloud API at all

Model Size Comparison: 8B vs 70B

Smaller models run faster and use less RAM, but larger models are more capable. Test the same prompt across different sizes of the same model to find the smallest size that still gets good results.

Good for: Optimizing for your hardware constraints

Coding Models: DeepSeek vs Cloud

DeepSeek Coder and other specialized models can be surprisingly competitive with cloud models on coding tasks. Compare them on your actual code generation and review prompts.

Good for: Development teams evaluating local coding assistants

Local Models for Sensitive Data

When data can't leave your network, local models are your only option. Compare multiple Ollama models to find the best performer for tasks involving confidential information.

Good for: Healthcare, legal, finance, and other regulated industries

How to Evaluate Ollama Models with Evvl

  1. 1
    Install Ollama and pull models

    Get one from ollama.com. Then run ollama pull llama3 to download a model.

  2. 2
    Open Evvl Desktop

    Evvl automatically detects running Ollama instances. No API key needed. (Ollama requires the desktop app due to browser CORS restrictions.)

  3. 3
    Pick local and cloud models to compare

    Select your Ollama models alongside cloud models from OpenAI, Anthropic, or Google for a direct comparison.

  4. 4
    Compare results side by side

    See local and cloud model responses at once. Decide if the local model is good enough or if you need to pay for a cloud API.

Frequently Asked Questions

Can I use Ollama with Evvl's web app?

No. Ollama runs a local API server that browsers can't connect to due to CORS restrictions. You'll need the Evvl desktop app to use Ollama models. The desktop app connects to Ollama directly with no configuration needed.

Which Ollama model should I start with?

Llama 3 8B is a good starting point. It runs well on most hardware and performs surprisingly well on many tasks. If you have a GPU with 32GB+ VRAM, try Llama 3 70B for better quality. For coding tasks, try DeepSeek Coder.

Can I compare Ollama models with cloud models?

Yes, that's one of Evvl's main strengths. In the desktop app, you can compare Ollama models against GPT, Claude, Gemini, and any other supported provider in the same evaluation.

Compare local vs cloud models

See how Llama, Mistral, and Qwen stack up against GPT-4.1 and Claude on your tasks.

Get Started