I am trying to use Tensorrt for my deployment purpose. I have tested few of your examples. I am able to see performance gains for Googlenet architecture, Alexnet architecture. When I am trying to test tensorrt for VGG architecture, I am unable to observe performance gain. I would like to know
- What sort of optimizations do you carry out?
- Are there any layer specific optimizations?
- Did you observe performance gains for VGG architecture?
- What is the performance gain for Faster RCNN architecture?
For Faster RCNN, If I carry out inference with batch size of 1, observed FPS was 6
For GoogleNet architecture, CAFFE took 13 ms for inference ( batch size of 1 ), where as Tensorrt took 4.56 milli seconds
For Alexnet architecture, CAFFE took 6.68 ms, where as Tensorrt took 3.857 ms
For VGG architecture, CAFFE took 25.76 ms, where as Tensorrt took 25.348
I am using Tesla K80 GPU and x86 architecture for my experiments. All the above experiments are carried out with a batch size of 1.
If you have numbers for SSD architecture, please put up here.