for (int s = 0; s < streams; ++s)
Iteration* iteration = new Iteration(offset + s, inference, *iEnv.context[offset], *iEnv.bindings[offset]);
the same context was used for multiple streams and according to the documentation for enqueueV2 it is undefined behavior:
Calling enqueueV2() in from the same IExecutionContext object with different CUDA streams concurrently results in undefined behavior. To perform inference concurrently in multiple streams, use one execution context per stream.
Am I right ?
But then why do you have in TensorRT such bad example with undefined behavior ?
It is not related to custom model, it is related to undefined behaviour in you trtexec example as I described above …
Running enqueueV2 on different streams is undefined behaviour, and you has such undefined behaviour in your code