dtype in caffe parser

I am converting caffe network to tensor rt on nano jetson.
I am trying to run with fp16. while is I set the flag fp16_mode to true I also tried to set the dtype in the parser - dtype to fp16.
when doing so I get really bad accuracy.
according to documentation: “dtype – The type to which the weights will be transformed.”
what is the influence of this flag? is it needed at all when running with fp16?



Could you share your modification detail with us?

Suppose you should change to FP16 by updating the function like this:

const nvcaffeparser1::IBlobNameToTensor* blobNameToTensor = parser->parse(
    mParams.prototxtFileName.c_str(), mParams.weightsFileName.c_str(), *network, <b>nvinfer1::DataType::kHALF</b>);


i am using the python API.
i use the following function(almost identical to your samples):

def build_engine_caffe(model_file, deploy_file):
with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.CaffeParser() as parser:
builder.max_workspace_size = common.GiB(1)
builder.strict_type_constraints = True
builder.fp16_mode = True
model_tensors = parser.parse(deploy=deploy_file, model=model_file, network=network, dtype=float32)

if i change dtype in the parser to float16 i see no influence.
if i use type float16 when allocating buffers i see strange results.

what is the influence of the dtype in the parser? should buffers be allocated only with float32?



You can check our API document for more information:


parse(self: tensorrt.tensorrt.CaffeParser, deploy: str, model: str, network: tensorrt.tensorrt.INetworkDefinition, dtype: tensorrt.tensorrt.DataType) → tensorrt.tensorrt.IBlobNameToTensor
dtype – The type to which the weights will be transformed.

dtype indicates the precision of your model.
To convert a fp32 model into fp16 engine, pleaseo set dtype=float32 and enable the fp16_mode in the builder.


Hi ,
but what if I want my model to be 16fp engine shouldn’t the precision of the model be also fp16? otherwise I don’t see the point of using this flag in the parser.


Sorry that I think my previous comment might mislead you.

If you want to convert a model trained with fp32 into fp16 TensorRT engine, you will need to enable the fp16 mode in the builder.
And that is all the option you need to enable.

builder.fp16_mode = True

Another related option is tensor data type, which indicates the tensor data format rather than the model type.
If you want to use fp16 data type, you will need to change the data option and the input/output buffer also.

model_tensors = parser.parse(deploy=deploy_file, model=model_file, network=network, dtype=trt.DataType.HALF)
h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=trt.nptype(trt.DataType.HALF))

In general, we use fp32 input/output data format since it’s human readable value without quantization.
And a corresponding quantization will be applied automatically to match the precision bewteen model and tensor data.


first of all thank you for the response.
I have tried to set the type to HALF both for the parser and the buffers allocations but then I got outputs of 0 and nan.
what is the advantage of using this flag regarding inference speed?


You can check this GitHub for the performance improvement of fp16:

To investigate your issue, would you mind to give us the complete sample and model for debugging?


thank you for the response. we decided to continue only with 32 bits fro now so we can close the issue