I saw a respond that sad “Enqueue is an asynchronous call, and you can launch multiple enqueue jobs with different buffers concurrently.” But in my project the kernel function can run concurrently,the TRT can not. I don not why and how can I deal it
Do you enqueue the buffer with different CUDA stream?
Please noted that you will need to use different stream to make it parallel.
Of cource， diffience cuda stream ,diffience buffer, diffience context .And I saw many issues like this,for example Can't infer concurrently · Issue #1218 · NVIDIA/TensorRT · GitHub
May I know the complexity of your model?
Since Jetson has relatively limited resource, CUDA task may need to wait for GPU in turn.
You will also need to use multi-thread in the same process since one process creates one CUDA context.
The GPU resource for different CUDA contexts are time-sliced, indicating the kernel can’t run in parallel:
The model is complex,like yolov4.but I use the GPU 2080ti.Is the reason for resource constraints?
Sorry that the suggestion is based on Jetson embedded platform.
Since you are using a desktop GPU, please post your question on the desktop board instead:
Thank you for your patience
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.