FP16 --half=true option doesn't work on GTX 1080 TI although it runs ./sample_int8 INT8

adit_bhrgv · August 23, 2017, 8:03pm

Hello

I am using GTX 1080 TI with DP4a and DP2a support with TensorRT 2.1 (cuda 8.0 and cuDnn 6.0)
With TensorRT I am able to run INT8 inference on MNIST dataset as 1080 TI provides INT8 support but it doesn’t do FP16 inference.

See logs below:

/TensorRT-2.1.2/data/mnist> …/…/bin/giexec --deploy=lenet.prototxt --model=lenet_iter_10000.caffemodel --output=prob --half2=true
deploy: lenet.prototxt
model: lenet_iter_10000.caffemodel
output: prob
half2
batch: 12
Input “data”: 1x28x28
Output “prob”: 10x1x1
Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
Average over 10 runs is 0.184803 ms.
Average over 10 runs is 0.176173 ms.
Average over 10 runs is 0.172307 ms.
Average over 10 runs is 0.172182 ms.
Average over 10 runs is 0.172362 ms.
Average over 10 runs is 0.170336 ms.
Average over 10 runs is 0.185437 ms.
Average over 10 runs is 0.171155 ms.
Average over 10 runs is 0.169658 ms.
Average over 10 runs is 0.171379 ms.

Any suggestions? Is it maybe a driver problem with GTX 1080 TI or something else ?

THanks!

njuffa · August 23, 2017, 8:51pm

I can’t provide a definite answer, but I can provide intelligent speculation. TensorRT may limit FP16 support to hardware platforms where it actually provides acceleration. Currently the only GPUs with high FP16 throughput are V100 and P100. All other GPUs, including GTX 1080 Ti, have very low FP16 throughput, so using that instead of FP32 would actually cause a massive slowdown, which is probably not what you are looking for.

adit_bhrgv · August 23, 2017, 10:06pm

Thank you for your quick response.Surely that can be the reason too.

One more quick question:

While running INT8 inference, i got below output whose accuracies looks great and also there is improvement(~36%) in time required to infer 1 image from FP32 to INT8. But is this improvement expected on Mnist dataset or is it too small, or it may be more on bigger models like Imagenet etc. ?

linux:/TensorRT-2.1.2/bin> ./sample_int8 mnist

INT8 run:400 batches of size 100 starting at 100
…
Top1: 0.9909, Top5: 1
Processing 40000 images averaged 0.0014101 ms/image and 0.14101 ms/batch.

FP32 run:400 batches of size 100 starting at 100
…
Top1: 0.9904, Top5: 1
Processing 40000 images averaged 0.00220697 ms/image and 0.220697 ms/batch.

Do you also have an idea how does the Calibration table and batches for other models like Imagenet etc are to be generated like for MNIST.?
I want to test INT8 inference on Imagenet now.

Thanks

Topic		Replies	Views
TensorRT fp16 plugin GPU-Accelerated Libraries	4	2743	August 23, 2017
TensorRT 2.1 inference on MNIST 40000 images GPU-Accelerated Libraries	3	652	February 27, 2018
FP16 support on gtx 1060 and 1080 GPU-Accelerated Libraries math-api	14	25582	May 19, 2021
is FP16 running only on the Volta? TensorRT	8	2892	October 12, 2021
TensorRT 2 INT8 samples GPU-Accelerated Libraries	8	5101	August 24, 2017
Analyzing sampleInt8 accuracy TensorRT	2	995	June 5, 2019
Same inference speed for INT8 and FP16 TensorRT	10	5792	October 12, 2021
High inference time while running UNet with INT8 precision TensorRT tensorrt	5	978	February 10, 2021
INT8 (8-bit inference, post-training quantization) on Windows 10 is much slower than Ubuntu 20.04 TensorRT	5	738	September 23, 2022
use int8 the result would be 0.1% differ TensorRT	3	644	June 19, 2019

FP16 --half=true option doesn't work on GTX 1080 TI although it runs ./sample_int8 INT8

Related topics