Does TensorRT uses multiple cores for inference for an input?

lavinan26 · April 11, 2020, 8:10pm

When we do inference for a single image input using a TensorRT inference engine build with a CNN model, does it use multiple threads to execute a particular layer or the whole inference operation is using a single thread?
Also how can I check the number of threads created or the number of cores used using visual profiler?
Since I am very new to cuda and tensorrt it would be great if someone can help me out to move on.

SunilJB · April 13, 2020, 6:36am

Hi,

TensorRT itself does not have multi-threads. The CPU threads is controlled by user.
For single input image inference only single thread will be running in TensorRT. However internally multiple GPU core will be used by kernel to perform CNN operations.

Please refer below link:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#nvprof

Thanks