Hi,
The below links might be useful for you.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-priorities
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html
For multi-threading/streaming, will suggest you to use Deepstream or TRITON
For more details, we recommend you raise the query in the Deepstream forum.
or
raise the query in the Triton Inference Server Github instance issues section.
Thanks!