running caffe with 16fp precision

NNresearch · November 17, 2019, 4:53pm

Hi,
i am trying to run a network that was trained in caffe framework.
the issue is that with dtype=trt.float32 and dtype=trt.float16 i get the same inference timing.
when trying to set the builder as in tensor flow frame work → ‘builder.fp16_mode = True’ the jetson crushed.
is there any additional configuration i have to do?

thanks

AastaLLL · November 18, 2019, 6:12am

Hi,

You can inference a Caffe model with TensorRT API directly, no need to use TesorFlow frameworks.
Here is a sample for your reference:
/usr/src/tensorrt/samples/sampleGoogleNet/

Or you can just test your model with trtexec:

$ cd /usr/src/tensorrt/bin/
$ ./trtexec --deploy=[xxx.prototxt] --output=[output layer]           #float32 mode
$ ./trtexec --deploy=[xxx.prototxt] --output=[output layer]  --fp16   #float16 mode

Thanks.

NNresearch · November 18, 2019, 4:10pm

Hi,
thanks for your response. I am using your example in /usr/src/tensorrt/samples/python/introductory_parser_samples/caffe_resnet50.py.
in this example in order to work with fp16 you just change the datatype of the builder but in my case it has no effect.
maybe there something additional to do or to check?

AastaLLL · November 19, 2019, 2:57am

Hi,

Could you share the TensorRT output log with us?

TensorRT will automatically choose a fast kernel based on the model architecture and hardware resource.
So it’s possible that TensorRT choose fp32 implementation even when fp16 is specified.

Thanks.

NNresearch · November 19, 2019, 8:03am

Hi,
when I run the trtexec on my network I get 40 milliseconds, when I add the flag --fp16 I got 20 milliseconds.
when I run the network with my script I got 30 milliseconds both for datatype = float32 and datatype = float16.
when I add the fp16_mode flag to the builder the program crashed.
maybe with the python api I cant set the flag to fp16?

AastaLLL · November 20, 2019, 3:25am

Hi,

The update should look like this:

diff --git a/caffe_resnet50.py b/caffe_resnet50.py
index a2bf006..59f93bc 100644
--- a/caffe_resnet50.py
+++ b/caffe_resnet50.py
@@ -102,6 +102,7 @@ def build_engine_caffe(model_file, deploy_file):
         # Workspace size is the maximum amount of memory available to the builder while building an engine.
         # It should generally be set as high as possible.
         builder.max_workspace_size = common.GiB(1)
+        builder.fp16_mode = True
         # Load the Caffe model and parse it in order to populate the TensorRT network.
         # This function returns an object that we can query to find tensors by name.
         model_tensors = parser.parse(deploy=deploy_file, model=model_file, network=network, dtype=ModelData.DTYPE)

Do you apply the similar change?
Thanks.

NNresearch · November 25, 2019, 8:19am

Hi,
thank you for the response. yes I tried that and what I got is 30 milliseconds for 30fp and 25 milliseconds for 16fp which is a little bit strange because I have expected for bigger improvement in performance.
but thank you anyway.

Topic		Replies	Views
How to run caffe fp16? Jetson TX1	7	3070	October 18, 2021
dtype in caffe parser Jetson Nano	9	1176	October 14, 2021
TensorRT on TX1 with jetpack 2.3.1 FP16 mode support Jetson TX1	4	684	October 18, 2021
FP32 and FP16 imagenet Jetson TX2	3	875	October 18, 2021
FP16 mode is not running faster than FP32 mode TensorRT	0	938	February 11, 2019
Build engine TensorRT on Jetson Nano Jetson Nano tensorrt	6	1684	August 30, 2023
TemsorRT Fp16 mode Jetson TX1	6	1279	October 18, 2021
No performance improvement on Jetson Nano FP16 vs FP32 TensorRT	6	2691	February 22, 2021
Different FP16 inference with tensorrt and pytorch TensorRT	5	4539	October 25, 2021
TensorRT Inferencing using TF-TRT framework FP32 vs FP16 Jetson AGX Orin tensorrt	6	349	June 3, 2024

running caffe with 16fp precision

Related topics