How to replicate the affect of Deepstream's layer-device-precision property in tao-converter?

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) :
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc): Yolo_v4
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here): 3.21.11_trt8.4_x86
• Training spec file(If have, please share here): NA
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.): NA

Starting from Deepstream 6.1.1, Deepstream supports mixed precision engine using the layer-device-precision property as shown in this config.

I’m currently using tao-converter to build engines from tao models:

tao-converter -h
usage: tao-converter [-h] [-e ENGINE_FILE_PATH]
	[-i INPUT_ORDER] [-s] [-u DLA_CORE]
	[-l engineLayerVerbose]
	[-v TensorRT version]
	[--precisionConstraints PRECISIONCONSTRAINTS]
    [--layerPrecisions layerName:precision]
    [--layerOutputTypes layerName:precision] 

How do I specify the layer precision listed in the Deepstream’s yolov4_tao config using tao-converter?

layer-device-precision: cls/mul:fp32:gpu;box/mul_6:fp32:gpu;box/add:fp32:gpu;box/mul_4:fp32:gpu;box/add_1:fp32:gpu;cls/Reshape_reshape:fp32:gpu;box/Reshape_reshape:fp32:gpu;encoded_detections:fp32:gpu;bg_leaky_conv1024_lrelu:fp32:gpu;sm_bbox_processor/concat_concat:fp32:gpu;sm_bbox_processor/sub:fp32:gpu;sm_bbox_processor/Exp:fp32:gpu;yolo_conv1_4_lrelu:fp32:gpu;yolo_conv1_3_1_lrelu:fp32:gpu;md_leaky_conv512_lrelu:fp32:gpu;sm_bbox_processor/Reshape_reshape:fp32:gpu;conv_sm_object:fp32:gpu;yolo_conv5_1_lrelu:fp32:gpu;concatenate_6:fp32:gpu;yolo_conv3_1_lrelu:fp32:gpu;concatenate_5:fp32:gpu;yolo_neck_1_lrelu:fp32:gpu

The closest option in tao-converter seem to be --layerPrecisions, but I can’t find any specific example, I don’t know if --layerPrecisions is the equivalent of layer-device-precision?

For latest tao-converter, yes, you can use --layerPrecisions.
For example,

./tao-converter -e test_engine -k nvidia_tlt -p Input,1x3x544x960,1x3x544x960,1x3x544x960 --layerPrecisions=cls/Sigmoid:fp32,cls/Sigmoid_1:fp32,box/Sigmoid_1:fp32,box/Sigmoid:fp32,cls/Reshape_reshape:fp32,box/Reshape_reshape:fp32,Transpose2:fp32,sm_reshape:fp32,encoded_sm:fp32,conv_big_object:fp32 --layerOutputTypes=cls/Sigmoid:fp32,cls/Sigmoid_1:fp32,box/Sigmoid_1:fp32,box/Sigmoid:fp32,cls/Reshape_reshape:fp32,box/Reshape_reshape:fp32,conv_sm_object:fp32,sm_reshape:fp32,Transpose2:fp32,encoded_sm:fp32,convolution_output1:fp32,convolution_output2:fp32,convolution_output:fp32 --precisionConstraints=obey yolov3.etlt


Thank you for the reply, how do I achieve the same result with tao-deploy?

Screenshot from 2023-05-26 09-05-48

I don’t see any options for layerPrecisions in tao_deploy, does it support building mixed precision engine?

Only tao-converter supports --layerPrecisions.


Thank you for the quick response.

Since tao-converter is deprecated for x86 platform, the only other way to build mixed precision engine now is using Deepstream right? Should I continue to use tao-converter or migrate to use Deepstream directly instead (i.e., write a deepstream app just to build engines)?

Will tao-deploy support more features from tao-converter? It seems that even though tao-deploy is recommended over tao-converter, it doesn’t have the features offered by tao-converter. The real alternative to tao-converter when it comes to building engines from tao models is actually Deepstream.

Yes, you can continue to use tao-converter.

1 Like