Cudamemcpy and other cuda operations effecting performance on tensorrt execution

We have some cuda operations running in a different cpu thread that is effecting Tensorrt execution speed when both tensorrt and the cuda operations are on same stream or different streams

Hi,

CUDA and TensorRT all use GPU for calculation.
So it’s expected the CUDA app can affect the performance of TensorRT inference.

Thanks.

would it help if we move operations to other hardware like DLA and PVA, VIC,etc since they are no longer on same gpu device? Any suggestions to help mitigate this problem would be great

Hi as per the Jetson datasheet, they mention usage of these VIC and PVA with VPI would release GPU and CPU for other tasks, so they wont be in the same device or is the same serialisation of operations in cuda streams will come into play

Hi,

It’s possible.

For inference, you can offload tasks to DLA as it is an inference engine.
You can find some examples in the below two repositories:

Thanks.

Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.