I built Openjet to lower the barrier to running local LLMs optimally. While existing tools make it easy to get started, their default configurations often leave significant performance on the table unless you manually tune parameters like GPU offload layers or KV cache quantization.
OpenJet solves this by auto-detecting your hardware and dynamically configuring a llama.cpp server with the optimal settings so you don’t have to manually tweak anything. It detects the most intelligent model based on your hardware and VRAM.
The Benchmarks in my testing on an RTX 3090 (240k context, running Qwen3.5-27B-Q4_K_M), OpenJet achieves ~38-40 tok/s entirely out of the box just by running the install command.
For comparison, the default Ollama configuration yields 16 tok/s on the exact same hardware and prompt. That is a 2.4x performance increase with zero manual configuration.
Key Features:
> Hardware Auto-Detection: Automatically calculates the best inference parameters for your specific CPU/GPU setup.
> CLI Interface: Run inference directly from the terminal (e.g., openjet chat “Hello world”)
> TUI & Python SDK: Includes a Terminal UI for chat and an SDK for integration into your own applications.
The goal is to let developers and enthusiasts get the absolute most out of their hardware without needing to constantly reference llama.cpp documentation to optimize their setup.
GitHub: https://github.com/L-Forster/open-jet
I would appreciate any feedback on the source code, the configuration logic, or suggestions on how to make local LLM setups even more accessible.