High Inference Time on Jetson AGX (ONNX Runtime GPU)

I’ve tested the NVIDIA TAO FoundationStereo model exported to ONNX and deployed on a Jetson AGX device. However, I’m observing an inference time of around 2 minutes per frame when generating disparity maps, even though the benchmark mentioned in the Git repo indicates ~1 second inference on Jetson AGX.

I have:

  • CUDA version: 12.8

  • ONNX Runtime GPU version: 1.23.0

It seems the model is using CPU providers for execution instead of GPU, despite using the GPU-enabled ONNX Runtime package.

Could someone please guide me on how to ensure GPU acceleration is used and reduce inference time to match the expected performance?

Thanks in advance for your help!

Hello,

Thanks for visiting the NVIDIA Developer Forums.
To ensure better visibility and support, I’ve moved your post to the Jetson category where it’s more appropriate

Cheers,
Tom

Hi,

Do you upgrade the CUDA manually?
If CUDA 12.6 is an option for you, could you try the onnxruntime package in the link below:

Thanks.

Yes, I installed onnxruntime-gpu using the .whl file from the official ONNX Runtime documentation."

Hi,

Could you try the package from the link shared above instead?
It’s the server from jetson-ai-lab.

Thanks.