Hi all!
We see a lot of interest in the Cosmos Reason2 model, and it is currently only supported on Jetson AGX Thor and more powerful devices. We wanted to share that we got Cosmos-Reason2-2B quantized running on the full Jetson lineup, including Orin Nano 8GB. This includes memory and latency numbers, instructions, and some practical adjustments we found necessary on these constrained devices.
What’s here:
-
Model repo & instructions: https://huggingface.co/embedl/Cosmos-Reason2-2B-W4A16
-
Setup notes (vLLM + settings)
-
Benchmarks
Questions for the community:
-
What serving stack do others use for VLMs on Jetson? (vLLM / TensorRT-LLM / custom)
-
For vision + reasoning workloads, where do you hit the first bottleneck?
-
Any further memory optimization on Nano series you recommend?
Thanks!