Batchsize performance differs greatly in the two application methods of tensorrt

I know there are two ways to apply tensorrt, including converting the model directly into a tensorrt model and adding layer to the engine to use tensorrt as a subnetwork.I used the second in my inference framework.When I set the batchsize to 1, the speed of my inference framework is almost as fast as tensorrt.When I set the batchsize to 1, my framework is about the same speed as tensorrt, but when I set the batchsize to be greater than one, I found that tensort became very fast.So I would like to ask if there is a problem with my settings or if tensorrt has other optimizations for multiple batches

       Here is the statement I set the batchsize in the framework.On the 2080Ti graphics card, the single batch of the Resnet model ran 1.6ms. Setting the 8 batch runtimes only took 2.8ms.
builder->setMaxBatchSize(batch_size);
builder->setMaxWorkspaceSize(TENSORRT_MAX_WORKSPACE);

Hello,

Correct. In general, TensorRT performs better with large batchsize.

Thank you.

@NVESJ but what about even larger batchsize (64 and 128)? I noticed a significant performance drop (batch 32 has the best performance). I created a topic on that issue and I’m willing to share my scripts and hardware info in order to get to the bottom of this, as I don’t think this is desirable behavior.