I’m testing the performance of TensorRT 2 using INT8 inference on P4.
A sample_int8.cpp is provided along with the installation of TensorRT 2, which can be used for testing performance of networks including googlenet, VGG and resnet. The flow of the sample is basically reading batches of data for calibration first, then reading other batches for INT8 inference and calculate the accuracy Top1 and Top5 scores.
The problem is that there are few documentation on the data structure to be fed to the network for RGB images, which is used for classification in googlenet, VGG and resnet. The sample of MNIST reads grayscale images and the sample of googlenet (a separate sample file) read dummy data.
From the source code, it can be inferred that the image data should be written in a binary file with filenames “batch0”, “batch1” … under “batches” directory.
Is there any example for preparing RGB images for use in the sample_int8.cpp? Any help is appreciated.
Also, regarding the inference performance, according to https://developer.nvidia.com/tensorrt, P40 card have INT8 inference on googlenet at about 6500 fps. Since the TOPS of P4 is approximately half of P40, the expected inference performance should be around 3000 fps.
But when running the sample_int8.cpp, the performance obtained is less than 1000 fps. Is there anything I am mising? I followed the installation guide and successfully installed cuda 8.0 and TensorRT 2 on Ubuntu 14.04.