TensorRT 2 INT8 samples

QinglinTian · April 26, 2017, 7:43am

I’m testing the performance of TensorRT 2 using INT8 inference on P4.

A sample_int8.cpp is provided along with the installation of TensorRT 2, which can be used for testing performance of networks including googlenet, VGG and resnet. The flow of the sample is basically reading batches of data for calibration first, then reading other batches for INT8 inference and calculate the accuracy Top1 and Top5 scores.

The problem is that there are few documentation on the data structure to be fed to the network for RGB images, which is used for classification in googlenet, VGG and resnet. The sample of MNIST reads grayscale images and the sample of googlenet (a separate sample file) read dummy data.

From the source code, it can be inferred that the image data should be written in a binary file with filenames “batch0”, “batch1” … under “batches” directory.

Is there any example for preparing RGB images for use in the sample_int8.cpp? Any help is appreciated.

Also, regarding the inference performance, according to TensorRT SDK | NVIDIA Developer, P40 card have INT8 inference on googlenet at about 6500 fps. Since the TOPS of P4 is approximately half of P40, the expected inference performance should be around 3000 fps.

But when running the sample_int8.cpp, the performance obtained is less than 1000 fps. Is there anything I am mising? I followed the installation guide and successfully installed cuda 8.0 and TensorRT 2 on Ubuntu 14.04.

ChrisGottbrath · April 29, 2017, 12:16am

QinglinTian,

Good questions.

RGB input data: We’ve talked about how much to go into the process of preparing incoming images prior to feeding those images into the first layer of a neural network. I don’t think any of the samples that come with TensorRT include that. I’ve sent an email around to some of my colleagues to see if any of them have any suggestions.

P40 perf: What batch size are you using? I am looking at our internal perf numbers and the batch size will matter a fair amount. Try 128 images / batch.

I’d like to know what you find.

ChrisGottbrath · April 29, 2017, 2:51pm

QinglinTian,

Some of my colleagues chimed in to point me to a few samples you might want to check out that do RGB processing.

https://github.com/dusty-nv/jetson-inference/blob/master/imageNet.cu
Converts RGB to planar BGR format that Alexnet and Googlenet derivatives use and applies mean value subtraction. CUDA function called by the network primitive classes before calling TensorRT.

The GRE inference demo does this as well. Specifically, see the preprocess subroutine in https://github.com/NVIDIA/gpu-rest-engine/blob/master/inference/classification.cpp

Hope this helps!

-Chris

QinglinTian · May 2, 2017, 2:02am

Update: it turns out that my P4 card got hardware defect

============================================================

I managed to get reasonable accuracy result using VGG16.

Another question arises when using VGG16 network is that in the sampleINT8.cpp, there is a calibration table defining cutoff and quantileIndex for different networks and VGG16 is missing. What is the recommended calibration parameter setting for VGG16 network? How is this value determined?

Regarding the performance, the stats from the log indicates that the time for processing an image is slightly longer than 1ms, so the frame rate is less than 1000 fps@batch size 100. I also tried batchsize 128 with changed “batch0, batch1, …, batch9” file and configurations in sample_int8.cpp but the performance is similar, at less than 1000 fps.

Also, I noticed that the accuracy loss for INT8 when compared to FP32 is relatively low while what I’m getting on Alexnet and VGG is obviously higher.

Can you provide more details on the evaluation of accuracy?

Mickle_chen · June 8, 2017, 7:00am

I can’t find ‘int8’ folder in my gie_samples. Can you tell me where did you get it?

QinglinTian · June 8, 2017, 7:47am

are you using TensorRT version 2?

Mickle_chen · June 8, 2017, 7:51am

yes.

QinglinTian · June 13, 2017, 3:34am

i’m not sure where the problem is. i don’t have a gie_samples dir, but a tensorrt dir

adit_bhrgv · August 24, 2017, 7:13am

Hello QinglinTian

I have tested ./sample_int8 on 1080 TI and got below accuracy, is this reasonable ? :
:~/no_backup/d1230/TensorRT-2.1.2/bin> ./sample_int8 mnist

INT8 run:400 batches of size 100 starting at 100
…
Top1: 0.9918, Top5: 1
Processing 40000 images averaged 0.00144474 ms/image and 0.144474 ms/batch.

FP32 run:400 batches of size 100 starting at 100
…
Top1: 0.9918, Top5: 1
Processing 40000 images averaged 0.00220669 ms/image and 0.220669 ms/batch.

I would like to test the same INT8 inference on Googlenet, Resnet and VGG
Can you please share what to change in sampleint8.cpp , how to generate the batches, etc.
It would be great if you can share what all things are required to run INT8 on Googlenet, VGG and Resnet

Thanks a lot !!!

Topic		Replies	Views
TensorRT sampleINT8API Demo low accuracy TensorRT	3	415	April 2, 2020
TensorRT INT8 calibration in C++ api TensorRT tensorrt	2	1872	February 14, 2022
TensorRT 2 sample INT8 GPU-Accelerated Libraries	3	1172	August 7, 2017
tensorRT int8 GPU-Accelerated Libraries	0	905	June 8, 2017
TensorRT 8.0.3 imagenet resnet model INT8 conversion identical output with different input after calibration TensorRT tensorrt	3	1250	December 23, 2021
Generate the INT8 calibration In TensorRT GPU-Accelerated Libraries	0	638	October 23, 2017
TensorRT INT8 inference, the result is totally wrong! TensorRT	7	869	May 13, 2020
Acceleration with INT8 precision using TensorRT TensorRT tensorrt , cuda , deep-learning	6	839	February 13, 2021
TensorRT 4.0 Python API INT8 Calibration TensorRT	3	1422	August 27, 2018
TensorRT 5 Int8 Calibration Example TensorRT	11	7782	October 12, 2021

TensorRT 2 INT8 samples

Related topics