dtype in caffe parser

NNresearch · December 2, 2019, 7:45am

Hi,
I am converting caffe network to tensor rt on nano jetson.
I am trying to run with fp16. while is I set the flag fp16_mode to true I also tried to set the dtype in the parser - dtype to fp16.
when doing so I get really bad accuracy.
according to documentation: “dtype – The type to which the weights will be transformed.”
what is the influence of this flag? is it needed at all when running with fp16?

thanks

AastaLLL · December 3, 2019, 1:51am

Hi,

Could you share your modification detail with us?

Suppose you should change to FP16 by updating the function like this:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#mixed_precision

const nvcaffeparser1::IBlobNameToTensor* blobNameToTensor = parser->parse(
    mParams.prototxtFileName.c_str(), mParams.weightsFileName.c_str(), *network, <b>nvinfer1::DataType::kHALF</b>);

Thanks.

NNresearch · December 3, 2019, 12:15pm

Hi,
i am using the python API.
i use the following function(almost identical to your samples):

def build_engine_caffe(model_file, deploy_file):
with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.CaffeParser() as parser:
builder.max_workspace_size = common.GiB(1)
builder.strict_type_constraints = True
builder.fp16_mode = True
model_tensors = parser.parse(deploy=deploy_file, model=model_file, network=network, dtype=float32)

if i change dtype in the parser to float16 i see no influence.
if i use type float16 when allocating buffers i see strange results.

what is the influence of the dtype in the parser? should buffers be allocated only with float32?

thanks

AastaLLL · December 4, 2019, 2:28am

Hi,

You can check our API document for more information:

https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/parsers/Caffe/pyCaffe.html#tensorrt.CaffeParser.parse

parse(self: tensorrt.tensorrt.CaffeParser, deploy: str, model: str, network: tensorrt.tensorrt.INetworkDefinition, dtype: tensorrt.tensorrt.DataType) → tensorrt.tensorrt.IBlobNameToTensor
...
dtype – The type to which the weights will be transformed.

dtype indicates the precision of your model.
To convert a fp32 model into fp16 engine, pleaseo set dtype=float32 and enable the fp16_mode in the builder.

Thanks.

NNresearch · December 4, 2019, 8:19am

Hi ,
but what if I want my model to be 16fp engine shouldn’t the precision of the model be also fp16? otherwise I don’t see the point of using this flag in the parser.

AastaLLL · December 5, 2019, 3:05am

Hi,

Sorry that I think my previous comment might mislead you.

If you want to convert a model trained with fp32 into fp16 TensorRT engine, you will need to enable the fp16 mode in the builder.
And that is all the option you need to enable.

builder.fp16_mode = True

Another related option is tensor data type, which indicates the tensor data format rather than the model type.
If you want to use fp16 data type, you will need to change the data option and the input/output buffer also.

model_tensors = parser.parse(deploy=deploy_file, model=model_file, network=network, dtype=trt.DataType.HALF)
h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=trt.nptype(trt.DataType.HALF))
...

In general, we use fp32 input/output data format since it’s human readable value without quantization.
And a corresponding quantization will be applied automatically to match the precision bewteen model and tensor data.

Thanks.

NNresearch · December 5, 2019, 4:07pm

Hi,
first of all thank you for the response.
I have tried to set the type to HALF both for the parser and the buffers allocations but then I got outputs of 0 and nan.
what is the advantage of using this flag regarding inference speed?

AastaLLL · December 16, 2019, 7:49am

Hi,

You can check this GitHub for the performance improvement of fp16:
https://github.com/NVIDIA-AI-IOT/tf_to_trt_image_classification#models

To investigate your issue, would you mind to give us the complete sample and model for debugging?

Thanks.

NNresearch · January 8, 2020, 8:34am

Hi,
thank you for the response. we decided to continue only with 32 bits fro now so we can close the issue