Real-Time Inference on Thor & RTX Pi0.5/GR00T N1.6/1.7 Thor 23 Hz RTX 5090 50-80Hz

Hi everyone,

I’m an independent developer with a background in algorithms, HPC, and robotics infrastructure. Recently I’ve been working on a lightweight inference engine built around hand-written CUDA kernels, focusing on small-batch and real-time performance (especially for VLA and robotics workloads).

Here are some recent results on Thor and Blackwell:

  • Pi0.5 — Jetson AGX Thor (SM110): 44 ms (23 Hz)

  • Pi0 — Jetson AGX Thor (SM110): 46 ms (22 Hz)

  • Pi0.5 — RTX 5090 (SM120): 17.58 ms (57 Hz)

  • Pi0 — RTX 5090 (SM120): 18.43 / 21.16 / 24.48 ms (54 / 47 / 41 Hz)

  • GROOT N1.6 — Jetson AGX Thor: 45 ms (T=50) / 41 ms (T=16) → 22 / 24 Hz

  • GROOT N1.6 — RTX 5090: 13.08 ms (T=50) / 12.53 ms (T=16) → 76 / 80 Hz

  • Pi0-FAST (token)

    • Thor: 8.1 ms/token (123 tok/s)

    • RTX 5090: 2.39 ms/token (418 tok/s)

The focus is on pushing true real-time inference under small-batch settings, which tends to be underserved by typical large-batch optimized stacks.

Still early, but happy to share more details or discuss if anyone is working on similar workloads 🙂

Feeback welcome!:GitHub - LiangSu8899/FlashRT: FlashRT is a high-performance realtime inference engine for small-batch, latency-sensitive AI workloads. The flagship integration is production VLA control for Pi0, Pi0.5, GROOT N1.6, and Pi0-FAST · GitHub

Hi,

Thanks a lot for sharing.

We also have a pi 0.5 tutorial below for your reference:

Thanks.

Thanks a lot for sharing the example — it’s a really helpful reference and gave me a lot of inspiration.

I also tried building a version using MLIR-TRT (mainly to better support both JAX and PyTorch):
https://github.com/LiangSu8899/openpi-jax_torch_mlir-trt/tree/main/deployment

With the flexibility of graph optimization on MLIR, I was able to get the TRT-based Pi0.5 down to around 70 ms.

After that, I experimented further with custom CUDA kernels to remove Q/DQ from the pipeline, and in my testing this brings a significant improvement on edge devices.

Really appreciate the work from your side — it’s been very helpful in exploring these optimizations.