llamafile: Run open-source LLMs locally as a single portable file.

llamafile lets you bundle a full LLM into a single executable, combining model weights, inference engine, and runtime into one file. Download it once; run it anywhere.
Unlike tools like Ollama or LocalAI, which require separate installation and client-server setup, llamafile is the model, runtime, and server in one file.
llamafile is a Mozilla Builders project.
Key enabler of sovereign AI
Run local LLMs privately, offline, and with full control over compute, data, and cost.
Cross-platform & hardware-friendly
Runs on a wide range of operating systems (Windows, macOS, Linux, BSD, etc.), without installation or setup.
Supports GGUF models including Llama, Mistral, Gemma, and Phi.
Local-first & privacy-conscious
Models run entirely on your device. No cloud, no data sharing, no external dependencies. Works fully offline for privacy-first AI workflows.
Low barrier for adoption
Ideal for developers, hobbyists, or organizations who want to experiment with or deploy open-source LLMs without complex infra, container orchestration, or heavy setup.
Get Started with llamafile
Download a model from our example llamafiles page, make it executable, and run it.