I’ve tested the NVIDIA TAO FoundationStereo model exported to ONNX and deployed on a Jetson AGX device. However, I’m observing an inference time of around 2 minutes per frame when generating disparity maps, even though the benchmark mentioned in the Git repo indicates ~1 second inference on Jetson AGX.
I have:
-
CUDA version: 12.8
-
ONNX Runtime GPU version: 1.23.0
It seems the model is using CPU providers for execution instead of GPU, despite using the GPU-enabled ONNX Runtime package.
Could someone please guide me on how to ensure GPU acceleration is used and reduce inference time to match the expected performance?
Thanks in advance for your help!