I am using GTX 1080 TI with DP4a and DP2a support with TensorRT 2.1 (cuda 8.0 and cuDnn 6.0)
With TensorRT I am able to run INT8 inference on MNIST dataset as 1080 TI provides INT8 support but it doesn’t do FP16 inference.
See logs below:
/TensorRT-2.1.2/data/mnist> …/…/bin/giexec --deploy=lenet.prototxt --model=lenet_iter_10000.caffemodel --output=prob --half2=true
deploy: lenet.prototxt
model: lenet_iter_10000.caffemodel
output: prob
half2
batch: 12
Input “data”: 1x28x28
Output “prob”: 10x1x1
Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
Average over 10 runs is 0.184803 ms.
Average over 10 runs is 0.176173 ms.
Average over 10 runs is 0.172307 ms.
Average over 10 runs is 0.172182 ms.
Average over 10 runs is 0.172362 ms.
Average over 10 runs is 0.170336 ms.
Average over 10 runs is 0.185437 ms.
Average over 10 runs is 0.171155 ms.
Average over 10 runs is 0.169658 ms.
Average over 10 runs is 0.171379 ms.
Any suggestions? Is it maybe a driver problem with GTX 1080 TI or something else ?
I can’t provide a definite answer, but I can provide intelligent speculation. TensorRT may limit FP16 support to hardware platforms where it actually provides acceleration. Currently the only GPUs with high FP16 throughput are V100 and P100. All other GPUs, including GTX 1080 Ti, have very low FP16 throughput, so using that instead of FP32 would actually cause a massive slowdown, which is probably not what you are looking for.
Thank you for your quick response.Surely that can be the reason too.
One more quick question:
While running INT8 inference, i got below output whose accuracies looks great and also there is improvement(~36%) in time required to infer 1 image from FP32 to INT8. But is this improvement expected on Mnist dataset or is it too small, or it may be more on bigger models like Imagenet etc. ?
linux:/TensorRT-2.1.2/bin> ./sample_int8 mnist
INT8 run:400 batches of size 100 starting at 100
…
Top1: 0.9909, Top5: 1
Processing 40000 images averaged 0.0014101 ms/image and 0.14101 ms/batch.
FP32 run:400 batches of size 100 starting at 100
…
Top1: 0.9904, Top5: 1
Processing 40000 images averaged 0.00220697 ms/image and 0.220697 ms/batch.
Do you also have an idea how does the Calibration table and batches for other models like Imagenet etc are to be generated like for MNIST.?
I want to test INT8 inference on Imagenet now.