--useCudaGraph using Python API?

When I test inference time with my model, I get better latency when I use the flag --useCudaGraph with trtexec. I see this is an inference flag, so I guess it doesn’t affect the saved engine. So how do I get this same speedup when running the saved engine through the TensorRT Python API?

Hi,

Please refer to the following document, which may help you:

Thank you.

Thanks for the link, but the only thing I can find on use Cuda graphs is in the C++ API: Developer Guide :: NVIDIA Deep Learning TensorRT Documentation Does that mean the Python API for using Cuda graphs is not yet implemented?

Hi,

TensorRT does not own the cudaGraph feature.
cudaGraph is offered by CUDA. Please refer to the following document for more information. We are not sure about Python API for it.

Thank you.

Thanks for pointing me in the right direction! For other people reading this topic: pycuda does not yet support graph execution but people are working on it: cuda - CUDA Python 12.0.0 documentation

There is also cuda-python, Nvidia’s own Cuda Python wrapper, which does seem to have graph support: cuda - CUDA Python 12.0.0 documentation So I’ll investigate that next.

Edit: sadly, cuda-python needs Cuda 11.0 or newer, which is not available in Jetpack 4.6.3, which is the newest Jetpack supported on the Jetson TX2 and Jetson Nano. So cuda-python cannot be used to use graph execution there.

1 Like