enqueue and enqueueV2 include the following warning in their documentation:
Calling enqueueV2() in from the same IExecutionContext object with different CUDA streams concurrently results in undefined behavior. To perform inference concurrently in multiple streams, use one execution context per stream
enqueueV3’s documentation does not. Should it? Is there any locking or other performance concerns I should be aware of?