TensorRT: Calling enqueue() from multiple host threads

The enqueue() function takes a cudaEvent_t as an input, which informs the caller when it is ok to refill the inputs again.

Is there some sort of signal that informs the caller when it is ok to call enqueue() again? Does the caller need to wait until the previous call to enqueue is complete? Or can enqueue() be called simultaneously from two different host threads with two different sets of input and output memory? Does it matter whether the calls to enqueue() use the same or different streams? Do the separate threads need to wait for the cudaEvent_t signalling that the inputs may be refilled again?


Enqueue is an asynchronous call, and you can launch multiple enqueue jobs with different buffers concurrently.
Jobs in the same CUDA stream is executed in sequence and no constraint with the different stream.
This event signal is for someone who wants to know when the buffer gets ready for reuse.

You can find more information about enqueue function on our document:
In a typical production case, TensorRT will execute asynchronously. The enqueue() method will add kernels to a CUDA stream specified by the application, which may then wait on that stream for completion. The fourth parameter to enqueue() is an optional cudaEvent which will be signaled when the input buffers are no longer in use and can be refilled.

can tensorrt support multi thread? i mean that just construct one context .


If you are using context.enqueue(), only one context is created.

i mean that if want to use multi threads to execute tensorrt ,i can use enqueue() or execute(),can i realize this ?

i use one context and use multi threads to execute tensorrt ,can i realize this?


You can push images to enqueue(). TensorRT will execute it orderly.
It should be similar to the multi-threads use case.