I know there are two ways to apply tensorrt, including converting the model directly into a tensorrt model and adding layer to the engine to use tensorrt as a subnetwork.I used the second in my inference framework.When I set the batchsize to 1, the speed of my inference framework is almost as fast as tensorrt.When I set the batchsize to 1, my framework is about the same speed as tensorrt, but when I set the batchsize to be greater than one, I found that tensort became very fast.So I would like to ask if there is a problem with my settings or if tensorrt has other optimizations for multiple batches
Here is the statement I set the batchsize in the framework.On the 2080Ti graphics card, the single batch of the Resnet model ran 1.6ms. Setting the 8 batch runtimes only took 2.8ms.
builder->setMaxBatchSize(batch_size);
builder->setMaxWorkspaceSize(TENSORRT_MAX_WORKSPACE);