Local-first coding agent that auto-configures llama.cpp for maximum hardware performance

flouisdev · April 13, 2026, 6:11am

I built Openjet to lower the barrier to running local LLMs optimally. While existing tools make it easy to get started, their default configurations often leave significant performance on the table unless you manually tune parameters like GPU offload layers or KV cache quantization.

OpenJet solves this by auto-detecting your hardware and dynamically configuring a llama.cpp server with the optimal settings so you don’t have to manually tweak anything. It detects the most intelligent model based on your hardware and VRAM.

The Benchmarks in my testing on an RTX 3090 (240k context, running Qwen3.5-27B-Q4_K_M), OpenJet achieves ~38-40 tok/s entirely out of the box just by running the install command.

For comparison, the default Ollama configuration yields 16 tok/s on the exact same hardware and prompt. That is a 2.4x performance increase with zero manual configuration.

Key Features:

> Hardware Auto-Detection: Automatically calculates the best inference parameters for your specific CPU/GPU setup.

> CLI Interface: Run inference directly from the terminal (e.g., openjet chat “Hello world”)

> TUI & Python SDK: Includes a Terminal UI for chat and an SDK for integration into your own applications.

The goal is to let developers and enthusiasts get the absolute most out of their hardware without needing to constantly reference llama.cpp documentation to optimize their setup.

GitHub: https://github.com/L-Forster/open-jet

I would appreciate any feedback on the source code, the configuration logic, or suggestions on how to make local LLM setups even more accessible.

Topic		Replies	Views
Running a Local LLM with Claude Code and llama.cpp on Jetson Thor and RTX 5090 Jetson Thor llama , agentic-ai	1	1337	March 30, 2026
LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui Jetson Projects generative_ai	86	26296	May 10, 2024
Open-Jet: self-hosted Agentic TUI for air-gapped Jetsons Jetson Projects jetson , generative_ai , agentic-ai	0	61	March 2, 2026
OpenCode with llama.cpp - lama-server (and/or vllm) DGX Spark / GB10 llama	3	9136	January 16, 2026
RAGJet: Retrieval-Augmented Generation on Jetson Xavier AGX (LLaMA + FastApi) Jetson Projects cudnn , llama	0	107	September 3, 2024
Introducing Ollama Support for Jetson Devices Jetson Projects cuda , natural-language-processing-nlp , artificialintelligence , interactive , docker-machine-learning , generative_ai	29	14026	August 28, 2024
LLM on Jetson Nano 4GB B01 Jetson Nano conversational-ai , generative_ai	13	4584	August 12, 2024
Moving from Mac to NVIDIA: bought powerful hardware, but drowning in configs DGX Spark / GB10 llama , nemotron	37	2219	February 25, 2026
Managing Local LLM Orchestration DGX Spark / GB10 Projects	11	1093	March 13, 2026
Running LLM in jetson agx orin Jetson AGX Orin llm	4	215	March 4, 2026

Local-first coding agent that auto-configures llama.cpp for maximum hardware performance

Related topics