TRT concurrently

I saw a respond that sad “Enqueue is an asynchronous call, and you can launch multiple enqueue jobs with different buffers concurrently.” But in my project the kernel function can run concurrently,the TRT can not. I don not why and how can I deal it

Hi,

Do you enqueue the buffer with different CUDA stream?
Please noted that you will need to use different stream to make it parallel.
https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_execution_context.html#aaf0b8fd7435076a3e7dd89bd19b0f1c9

Thanks.

Of cource, diffience cuda stream ,diffience buffer, diffience context .And I saw many issues like this,for example Can't infer concurrently · Issue #1218 · NVIDIA/TensorRT · GitHub

Hi,

May I know the complexity of your model?
Since Jetson has relatively limited resource, CUDA task may need to wait for GPU in turn.

You will also need to use multi-thread in the same process since one process creates one CUDA context.
The GPU resource for different CUDA contexts are time-sliced, indicating the kernel can’t run in parallel:
https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#multiple-contexts

Thanks.

The model is complex,like yolov4.but I use the GPU 2080ti.Is the reason for resource constraints?

Hi,

Sorry that the suggestion is based on Jetson embedded platform.
Since you are using a desktop GPU, please post your question on the desktop board instead:

Thanks.

Thank you for your patience

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.