how to parallel inferencing multi models ? demo code
Hi,
The below links might be useful for you.
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html
For multi-threading/streaming, will suggest you to use Deepstream or TRITON
For more details, we recommend you raise the query in Deepstream forum.
or
raise the query in Triton Inference Server Github instance issues section.
Thanks!