llamafile: Run open-source LLMs locally as a single portable file.

llamafile lets you bundle a full LLM into a single executable, combining model weights, inference engine, and runtime into one file. Download it once; run it anywhere.

Unlike tools like Ollama or LocalAI, which require separate installation and client-server setup, llamafile is the model, runtime, and server in one file.

llamafile is a Mozilla Builders project.

000

Key enabler of sovereign AI

Run local LLMs privately, offline, and with full control over compute, data, and cost.

Cross-platform & hardware-friendly

Runs on a wide range of operating systems (Windows, macOS, Linux, BSD, etc.), without installation or setup.
Supports GGUF models including Llama, Mistral, Gemma, and Phi.

Local-first & privacy-conscious

Models run entirely on your device. No cloud, no data sharing, no external dependencies. Works fully offline for privacy-first AI workflows.

Low barrier for adoption

Ideal for developers, hobbyists, or organizations who want to experiment with or deploy open-source LLMs without complex infra, container orchestration, or heavy setup.

Get Started with llamafile

Download a model from our example llamafiles page, make it executable, and run it.

GITHUB

Documentation

llamafile: Run open-source LLMs locally as a single portable file.