IExecutionContext and multiple streams

Is it an acceptable use-case for an IExecutionContext to enqueue work on multiple streams? Because, if it is, things seem to be terribly broken. And if not, TensorRT should at the very least assert or better still change the API so it’s not possible to shoot yourself in the foot.

Example, running this is fine:

IExecutionContext* context = ...;

cudaStream_t stream1;
cudaStreamCreate(stream1);
context->enqueue(1, bindings1, stream1, nullptr);
context->enqueue(1, bindings2, stream1, nullptr);
cudaStreamSynchronize(stream1);

// Results in the output look fine

This is fine too:

IExecutionContext* context = ...;

cudaStream_t stream1, stream2;
cudaStreamCreate(stream1);
cudaStreamCreate(stream2);
context->enqueue(1, bindings1, stream1, nullptr);
cudaStreamSynchronize(stream1);

// Enqueue after synchronize on stream1
context->enqueue(1, bindings2, stream2, nullptr);
cudaStreamSynchronize(stream2);

// Results look okay

But, running this gives garbage on the outputs:

IExecutionContext* context = ...;

cudaStream_t stream1, stream2;
cudaStreamCreate(stream1);
cudaStreamCreate(stream2);
context->enqueue(1, bindings1, stream1, nullptr);
context->enqueue(1, bindings2, stream2, nullptr);
cudaStreamSynchronize(stream1);
cudaStreamSynchronize(stream2);

// Results are garbage

Thoughts?

I meet the same question, do you solved the problem?

I think each cuda stream should have an individual iExecutionContext.