I am building a Terminal User Interface (like Claude Code) for self-hosted AI agents on Jetsons. Works in air-gapped environments.
Unlike other solutions, this is optimised for unified memory machines, as to avoid OOM errors.
The agent can do stuff like edit, read, create files - manage and interpret data locally.
Currently, it gets ~17 tok/s on Jetson Orin Nano 8GB using Qwen3-4B-Instruct-4bit
In the future, adding TensorRT .engine support which will boost inference further. I am trying to get the memory footprint down, so if anyone has knowledge on kv cache optimisation, that would be great.
I would love to get your feedback and people try running it on more capable devices and models - post your results here.
Run
pip install open-jet
open-jet --setup
Website: https://www.openjet.dev/
Directly on PyPi: open-jet · PyPI