How should batching be handled in TensorRT custom Plugin implementations. (Does TensoRT create seperate CUDA streams for each batch?)

renf · January 13, 2020, 7:15am

I have written a custom TensorRT plugin which calls a custom CUDA kernel that operates on a single batch at a time (The kernel computes the output of a single batch of computation during each kernel launch). Inside the enqueue function of my TensorRT plugin, I invoke the kernel “batchSize” number of times inside a for loop as in the pseudo-code below:

int CustomPlugin::enqueue(int batchSize, const void *const *inputs, void **outputs, void *workspace, cudaStream_t stream) {

size_t inputOffset = … ;
size_t ouptutOffset = … ;
for (int i = 0; i < batchsize; i++) {
// Run the kernel on separately for each batch.
launchCustomCudaKernel(inputs[0] + i * inputOffset, ouptuts[0] + i * outputOffset);
}
}

The above pseudo-code of my plugin launches the CUDA kernel multiple times for each batch index which does not seem optimal. Therefore, if I want my Custom Plugin to run optimally, do I need to re-implement my CUDA kernel so that it can handle multiple batches internally (i.e. a single launch handles all batches). OR, does TensorRT internally create multiple CUDA streams for each batch, such that each batch index runs on a seperate stream in which case batchSize will always equal 1?

SunilJB · January 13, 2020, 11:41am

Hi,

For optimal performance, I will recommend you to re-implement your CUDA kernel of custom plugin to handle multiple batches internally.

Thanks

Topic		Replies	Views
Batch inference parallelization on tensorrt TensorRT tensorrt , cuda	5	1051	May 5, 2021
Tensorrt & multiple streams GPU-Accelerated Libraries	0	1030	February 6, 2018
Batch inference parallelization on tensorrt DeepStream SDK tensorrt	2	548	October 12, 2021
TRT concurrently Jetson TX2 tensorrt	7	1194	September 5, 2021
Multi Stream in TensorRT TensorRT	1	2173	July 28, 2020
Implement multiple streams in plugin layers TensorRT	0	545	May 2, 2019
Plugin layer should use the same cuda stream which recieved in enqueue call TensorRT	2	461	October 12, 2021
Multiple CUDA streams for one tensorrt Model TensorRT	3	38	April 2, 2026
what is `Kernel Auto-Tuning` and `Multi-Stream Execution`? TensorRT	4	3863	August 2, 2019
Speedup by increasing # of streams vs. batch size TensorRT	2	790	June 23, 2022

How should batching be handled in TensorRT custom Plugin implementations. (Does TensoRT create seperate CUDA streams for each batch?)

Related topics