I have an SSD model with a custom layer. I initialize TensorRT net with a plugin factory which creates my plugin.
issue is when I do inference and provide IExecutionContext::enqueue with a stream creating using CU_STREAM_NON_BLOCKING the network returns invalid results, but when I create the stream with CU_STREAM_DEFAULT everything works fine.
Looking in Nsight I see most of the network kernels, including my custom layer are executed on the stream provided to IExecutionContext::enqueue while a few kernels named “void copy_kernel<float, int=0>(cublasCopyParams)” are executed on another stream.
It seems like one of the layers uses a stream other then the one provided to enqueue.
Can I work around this? I want to avoid synchronization with the default stream.