running caffe with 16fp precision

i am trying to run a network that was trained in caffe framework.
the issue is that with dtype=trt.float32 and dtype=trt.float16 i get the same inference timing.
when trying to set the builder as in tensor flow frame work -> ‘builder.fp16_mode = True’ the jetson crushed.
is there any additional configuration i have to do?



You can inference a Caffe model with TensorRT API directly, no need to use TesorFlow frameworks.
Here is a sample for your reference:

Or you can just test your model with trtexec:

$ cd /usr/src/tensorrt/bin/
$ ./trtexec --deploy=[xxx.prototxt] --output=[output layer]           #float32 mode
$ ./trtexec --deploy=[xxx.prototxt] --output=[output layer]  --fp16   #float16 mode


thanks for your response. I am using your example in /usr/src/tensorrt/samples/python/introductory_parser_samples/
in this example in order to work with fp16 you just change the datatype of the builder but in my case it has no effect.
maybe there something additional to do or to check?


Could you share the TensorRT output log with us?

TensorRT will automatically choose a fast kernel based on the model architecture and hardware resource.
So it’s possible that TensorRT choose fp32 implementation even when fp16 is specified.


when I run the trtexec on my network I get 40 milliseconds, when I add the flag --fp16 I got 20 milliseconds.
when I run the network with my script I got 30 milliseconds both for datatype = float32 and datatype = float16.
when I add the fp16_mode flag to the builder the program crashed.
maybe with the python api I cant set the flag to fp16?


The update should look like this:

diff --git a/ b/
index a2bf006..59f93bc 100644
--- a/
+++ b/
@@ -102,6 +102,7 @@ def build_engine_caffe(model_file, deploy_file):
         # Workspace size is the maximum amount of memory available to the builder while building an engine.
         # It should generally be set as high as possible.
         builder.max_workspace_size = common.GiB(1)
+        builder.fp16_mode = True
         # Load the Caffe model and parse it in order to populate the TensorRT network.
         # This function returns an object that we can query to find tensors by name.
         model_tensors = parser.parse(deploy=deploy_file, model=model_file, network=network, dtype=ModelData.DTYPE)

Do you apply the similar change?

thank you for the response. yes I tried that and what I got is 30 milliseconds for 30fp and 25 milliseconds for 16fp which is a little bit strange because I have expected for bigger improvement in performance.
but thank you anyway.