[Guide] Running Cosmos Reason2 on AGX Orin + JetPack 6.2.2 from Ubuntu 24 — FP8/SM 8.7 fix included

Hey everyone,

I recently went through the full process of getting Cosmos Reason2 2B running on a Jetson AGX Orin 64GB from an Ubuntu 24.04 host machine and wanted to share what I learned, since I hit a few issues that aren’t covered in the official documentation.

The setup problem: SDK Manager doesn’t support Ubuntu 24 as a host OS, so JetPack 6.x versions show as “Cannot execute as super user” in the selector. The fix is to use NVIDIA’s Ubuntu 22.04 Docker image for SDK Manager — but importantly, this image is not available via docker pull. It’s a .tar.gz download from the developer portal that you load locally with docker load. Missing this detail sends you down a frustrating path.

The bigger issue — FP8 on SM 8.7: Once JetPack 6.2.2 was flashed and I followed the Jetson AI Lab tutorial for Cosmos Reason2, inference output was completely incoherent — “somebody somebody somebody…” repeating for the full token length. The root cause is a Marlin kernel gap: the official vLLM wheels include kernels for SM 8.0, 8.6, 8.9, and 9.0 but not SM 8.7, which is what the AGX Orin runs. FP8 falls back to a broken path on Orin specifically. The nvcr.io/nvidia/vllm:26.01-py3 container also doesn’t work on JetPack 6.2.2 — it was built for CUDA 13.0 / JetPack 7.

The fix: Use the W4A16 quantized model embedl/Cosmos-Reason2-2B-W4A16 with Package vllm · GitHub . This takes a completely different kernel path (MarlinLinearKernel for WNA16) that works correctly on SM 8.7, and produces clean coherent inference at 500-800ms per frame.

I’ve written up the complete guide covering the full setup from Ubuntu 24 → JetPack 6.2.2 → Cosmos Reason2 serving → live webcam inference, including all the commands and the explanation of why each issue occurs:

👉 Full guide on Medium: https://medium.com/@shubhamkanitkar32/how-to-run-nvidia-cosmos-reason2-on-jetson-agx-orin-5da1f8aeb742

The end result is Cosmos Reason2 running fully offline on the AGX Orin, correctly identifying a SO-101 robotic arm in a live webcam feed and generating structured spatial reasoning output. Happy to answer any questions here if anyone is working through a similar setup.

we add support Orin in 26.03

You’re a lifesaver! You’ve saved us days and weeks of endlessly debugging. We encountered this issue just yesterday and weren’t sure what the issue was - vram limit, tokenizer issue, model issue etc. - and were trying to solve it without any progress. Thank you very much for posting this.