TensorRT fp16 plugin

ceccocats · August 14, 2017, 12:58pm

Hi,
I noticed that in half mode, before executing a plugin, tensorRT converts the data to float32 and at the end reconvert it to float16.
Is it possible to write a plugin to infer directly the fp16 data without the conversion?

thanks

njuffa · August 14, 2017, 4:32pm

What advantages do you think would such a solution have?

ceccocats · August 14, 2017, 6:12pm

Since I have a network model like this:
Conv2D
Activation
Conv2D
Activation
Pool
Conv2D
Activation
Conv2D
Activation
ecc

with a custom activation function implemented as a tensorRT plugin, it has to do a lot of conversion to run in half precision mode.
Because of that it gain a lot less speedup than the same network with ReLU activations.

njuffa · August 14, 2017, 7:27pm

Does the CUDA profiler show that the code is limited by computational throughput?

FP16 computation in intermediate steps only makes sense if you have a GPU with high FP16 throughput. As far as I am aware, there are only two of those at the moment: P100 and V100. If you have one of those, more power to you.

All other GPUs have only rudimentary F16 computation capabilities (or none), so doing intermediate computation in FP32 is the way to go for performance. The conversion overhead FP16/FP32 is often negligible, drowned out by memory traffic overhead. Using FP16 as a storage format helps reduce this memory overhead.

Depending on the specifics of your computation, doing everything in FP16 may also negatively affect accuracy.

adit_bhrgv · August 23, 2017, 7:44pm

Hi,

I am using TensorRT 2.1 for inferencing my Caffe models on GTX 1080 TI with FP16 and INT8 support.
Although I am able to run ./sample_mnist_int8 successfully but when I try to run “bin/giexec --deploy=lenet.prototxt --model=lenet_iter_10000.caffemodel --output=prob --half2=true”, TensorRT returns an error as “Half2 support requested on hardware without native FP16 support, performance will be negatively affected.”

I don’t understand why it runs INT8 and not FP16 as 1080 ti has both features. See logs below:

~/no_backup/d1230/TensorRT-2.1.2/data/mnist> …/…/bin/giexec --deploy=lenet.prototxt --model=lenet_iter_10000.caffemodel --output=prob --half2=false --batch=12

deploy: lenet.prototxt
model: lenet_iter_10000.caffemodel
output: prob
half2
batch: 12
Input “data”: 1x28x28
Output “prob”: 10x1x1
Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
Average over 10 runs is 0.184803 ms.
Average over 10 runs is 0.176173 ms.
Average over 10 runs is 0.172307 ms.
Average over 10 runs is 0.172182 ms.
Average over 10 runs is 0.172362 ms.
Average over 10 runs is 0.170336 ms.
Average over 10 runs is 0.185437 ms.
Average over 10 runs is 0.171155 ms.
Average over 10 runs is 0.169658 ms.
Average over 10 runs is 0.171379 ms.

Thanks !!!

Topic		Replies	Views
TENSORRT Model using FP16 Plugins and Kernels TensorRT	4	1128	April 26, 2019
FP16 --half=true option doesn't work on GTX 1080 TI although it runs ./sample_int8 INT8 GPU-Accelerated Libraries	2	5116	August 23, 2017
Implement Plugin Layer with support of FP16 mode TensorRT	0	1043	April 26, 2019
Dose TITAN Xp support FP16 TensorRT tensorrt , cuda , ubuntu	1	663	November 15, 2023
Plugin to convert to and from half precision within the network TensorRT	2	818	October 12, 2021
half float can't accelerate in tensorRT. Jetson TX2	2	687	October 18, 2021
How does TensorRT handle plugin layer with FP16 mode TensorRT	0	659	April 29, 2019
Speed up float16 conversion using python Jetson AGX Xavier tensorrt , python , cudnn	6	638	May 7, 2024
TensorRT on TX1 with jetpack 2.3.1 FP16 mode support Jetson TX1	4	750	October 18, 2021
How To Set Fp16 Mode When Platform Does Not Support Fast FP16? TensorRT	3	1815	February 13, 2019

TensorRT fp16 plugin

Related topics