Tao-converter doesn't convert ".etlt" to ".engine"

• Hardware: Nano

• Network Type: FaceNet

• TLT Version: N/A(I am only using tao-converter)

• Training spec file: N/A (I have pretrained models with “.etlt” extentions)

• JetPack Version: 4.6.4

• How to reproduce the issue ?

Follow this guide:

  • Select and download the “tao-converter” for Jetpack 4.6 from the table under the title “Prerequistes”.

  • Create a folder for tao-converter right under ~ named “tcnv”

  • Open the downloaded tao-converter zip file with “Files” application.

  • Open the folder you have just created with another window of the “Files” application.

  • Drag and drop the contents of the zip file one by one from the window and drop them into the other window on which the directory you have created have opened.

  • Open the terminal and move right under ~ directory.

  • Enter these commands:

git clone -b release/tao3.0_ds6.0.1 https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps.git
sudo apt-get install git-lfs
git lfs install
git lfs install --skip repo
git lfs install --skip-repo
cd deepstream_tao_apps
git lfs pull
export TAO_CONVERTER=/home/User/tcnv/tao-converter
export MODEL_PRECISION=fp16
sudo ./download_models.sh
export CUDA_VER=10.2
make
  • Check “models/faciallandmark” if there is any tensorrt engine file have been created.

  • See there isn’t any tensorrt engine file is present inside the directory.

Thanks in advance for all the help you offer.

According to https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/release/tao3.0_ds6.0.1/download_models.sh, this script only downloads models.
It does not convert etlt model to tenosrrt engine.
Did you download tao-converter and run command to generate tensorrt engine?

I have downloaded and tested with /home/User/tcnv/tao-converter -h command.

I am fooled by this line in the readme:

Please run the following script to download pre-trained models and generate GazeNet and Gesture engines with tao-converter tool.

When i checked Gesture model folder, there were only ".etlt and calibration files.

I am sorry for taking your time and thank you for your explanation.

P.S. If you don’t mind could you explain how can i use tao-converter to generate tensorrt engine for “heartrate” and “facenet” models please?

Refer to https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/master/build_triton_engine.sh.

Thank you for the link.

After referring to the link and trying the command:

/home/User/tcnv/tao-converter -k nvidia_tlt -t int8 -c models/faciallandmark/fpenet_cal.txt -b 1 -d 3,416,736 -e models/faciallandmark/facenet.etlt_b1_gpu0_int8.engine models/faciallandmark/facenet.etlt

I got this output:

[INFO] [MemUsageChange] Init CUDA: CPU +230, GPU +0, now: CPU 248, GPU 3443 (MiB)
[INFO] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 248 MiB, GPU 3429 MiB
[INFO] [MemUsageSnapshot] End constructing builder kernel library: CPU 277 MiB, GPU 3460 MiB
[WARNING] Requesting INT8 data type but platform has no support, ignored.
[INFO] ---------- Layers Running on DLA ----------
[INFO] ---------- Layers Running on GPU ----------
[INFO] [GpuLayer] conv1/convolution + activation_1/Relu
[INFO] [GpuLayer] block_1a_conv_1/convolution + activation_2/Relu
[INFO] [GpuLayer] block_1a_conv_shortcut/convolution
[INFO] [GpuLayer] block_1a_conv_2/convolution + add_1/add + activation_3/Relu
[INFO] [GpuLayer] block_1b_conv_1/convolution + activation_4/Relu
[INFO] [GpuLayer] block_1b_conv_shortcut/convolution
[INFO] [GpuLayer] block_1b_conv_2/convolution + add_2/add + activation_5/Relu
[INFO] [GpuLayer] block_2a_conv_1/convolution + activation_6/Relu
[INFO] [GpuLayer] block_2a_conv_shortcut/convolution
[INFO] [GpuLayer] block_2a_conv_2/convolution + add_3/add + activation_7/Relu
[INFO] [GpuLayer] block_2b_conv_1/convolution + activation_8/Relu
[INFO] [GpuLayer] block_2b_conv_shortcut/convolution
[INFO] [GpuLayer] block_2b_conv_2/convolution + add_4/add + activation_9/Relu
[INFO] [GpuLayer] block_3a_conv_1/convolution + activation_10/Relu
[INFO] [GpuLayer] block_3a_conv_shortcut/convolution
[INFO] [GpuLayer] block_3a_conv_2/convolution + add_5/add + activation_11/Relu
[INFO] [GpuLayer] block_3b_conv_1/convolution + activation_12/Relu
[INFO] [GpuLayer] block_3b_conv_shortcut/convolution
[INFO] [GpuLayer] block_3b_conv_2/convolution + add_6/add + activation_13/Relu
[INFO] [GpuLayer] block_4a_conv_1/convolution + activation_14/Relu
[INFO] [GpuLayer] block_4a_conv_shortcut/convolution
[INFO] [GpuLayer] block_4a_conv_2/convolution + add_7/add + activation_15/Relu
[INFO] [GpuLayer] block_4b_conv_1/convolution + activation_16/Relu
[INFO] [GpuLayer] block_4b_conv_shortcut/convolution
[INFO] [GpuLayer] block_4b_conv_2/convolution + add_8/add + activation_17/Relu
[INFO] [GpuLayer] output_bbox/convolution
[INFO] [GpuLayer] output_cov/convolution
[INFO] [GpuLayer] PWN(output_cov/Sigmoid)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +121, now: CPU 453, GPU 3460 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +241, GPU -8, now: CPU 694, GPU 3452 (MiB)
[INFO] Local timing cache in use. Profiling results in this builder pass will not be stored.
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 2 output network tensors.
[INFO] Total Host Persistent Memory: 45696
[INFO] Total Device Persistent Memory: 14752768
[INFO] Total Scratch Memory: 0
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 6 MiB, GPU 962 MiB
[INFO] [BlockAssignment] Algorithm ShiftNTopDown took 1.25836ms to assign 3 blocks to 26 nodes requiring 322097152 bytes.
[INFO] Total Activation Memory: 322097152
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +4, now: CPU 940, GPU 3088 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +1, now: CPU 940, GPU 3089 (MiB)
[INFO] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +5, GPU +16, now: CPU 5, GPU 16 (MiB)

Even though i couldn’t see any error or any message mentioning a failure, when i checked the models/faciallandmark directory, i saw that it didn’t create the engine file. Could you offer your guidance please?

Please check if there is write access. Suggest you to change to another folder.

Thanks, for the suggestion. After solving the write access issue, i have managed to create the engine file for faciallandmarks. But, i couldn’t found how to do the conversion for heartrate after refering to link. I have tried the command, referring here:

/home/User/tcnv/tao-converter -k nvidia_tlt -t int8 -p input,1x3x36x36,1x3x72x72,1x3x72x72 -e models/heartrate/heartrate.etlt_b16_gpu0_fp16.engine models/heartrate/heartrate.etlt

But, i got this output:

[INFO] [MemUsageChange] Init CUDA: CPU +230, GPU +0, now: CPU 248, GPU 3855 (MiB)
[INFO] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 248 MiB, GPU 3884 MiB
[INFO] [MemUsageSnapshot] End constructing builder kernel library: CPU 278 MiB, GPU 3894 MiB
[INFO] ----------------------------------------------------------------
[INFO] Input filename: /tmp/fileCBvqWr
[INFO] ONNX IR version: 0.0.5
[INFO] Opset version: 10
[INFO] Producer name: tf2onnx
[INFO] Producer version: 1.8.4
[INFO] Domain:
[INFO] Model version: 0
[INFO] Doc string:
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] ShapedWeights.cpp:173: Weights dense_1/kernel/read__27 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[ERROR] Number of optimization profiles does not match model input node number.
Aborted (core dumped)

Could you offer your advice please?

Glad to know fpenet works now.

For heartratenet, please run

    tao-converter -k nvidia_tlt -t fp16 \
        -p appearance_input:0,1x3x72x72,1x3x72x72,2x3x72x72 \
        -p motion_input:0,1x3x72x72,1x3x72x72,2x3x72x72 \
        -e model.engine \
        model.etlt

Thank you for your kind advice.

After running this command for heartratenet:

home/User/tcnv/tao-converter -k nvidia_tlt -t fp16 -p appearance_input:0,1x3x72x72,1x3x72x72,2x3x72x72 -p motion_input:0,1x1x3x72x72,1x1x3x72x72,1x2x3x72x72 -e heartrate.etlt_b16_gpu0_fp16.engine heartrate.etlt

I managed to create the engine and this is the output i got:

[INFO] [MemUsageChange] Init CUDA: CPU +230, GPU +0, now: CPU 248, GPU 3831 (MiB)
[INFO] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 248 MiB, GPU 3831 MiB
[INFO] [MemUsageSnapshot] End constructing builder kernel library: CPU 277 MiB, GPU 3860 MiB
[INFO] ----------------------------------------------------------------
[INFO] Input filename: /tmp/fileqTkTeF
[INFO] ONNX IR version: 0.0.5
[INFO] Opset version: 10
[INFO] Producer name: tf2onnx
[INFO] Producer version: 1.8.4
[INFO] Domain:
[INFO] Model version: 0
[INFO] Doc string:
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] ShapedWeights.cpp:173: Weights dense_1/kernel/read__27 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[INFO] Detected input dimensions from the model: (-1, 3, 72, 72)
[INFO] Detected input dimensions from the model: (-1, 3, 72, 72)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 72, 72) for input: appearance_input:0
[INFO] Using optimization profile opt shape: (1, 3, 72, 72) for input: appearance_input:0
[INFO] Using optimization profile max shape: (2, 3, 72, 72) for input: appearance_input:0
[INFO] Using optimization profile min shape: (1, 3, 72, 72) for input: motion_input:0
[INFO] Using optimization profile opt shape: (1, 3, 72, 72) for input: motion_input:0
[INFO] Using optimization profile max shape: (2, 3, 72, 72) for input: motion_input:0
[WARNING] DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU
[INFO] ---------- Layers Running on DLA ----------
[INFO] ---------- Layers Running on GPU ----------
[INFO] [GpuLayer] motion_conv1/BiasAdd
[INFO] [GpuLayer] appearance_conv1/BiasAdd
[INFO] [GpuLayer] PWN(motion_conv1_act/Tanh)
[INFO] [GpuLayer] PWN(appearance_conv1_act/Tanh)
[INFO] [GpuLayer] motion_conv2/BiasAdd
[INFO] [GpuLayer] appearance_conv2/BiasAdd
[INFO] [GpuLayer] PWN(appearance_conv2_act/Tanh)
[INFO] [GpuLayer] average_pooling2d_2/AvgPool
[INFO] [GpuLayer] attention_1/convolution
[INFO] [GpuLayer] appearance_conv3/BiasAdd
[INFO] [GpuLayer] PWN(appearance_conv3_act/Tanh)
[INFO] [GpuLayer] appearance_conv4/BiasAdd
[INFO] [GpuLayer] PWN(PWN(attention_1_act/Sigmoid), PWN(PWN(motion_conv2_act/Tanh), multiply_1/mul))
[INFO] [GpuLayer] PWN(appearance_conv4_act/Tanh)
[INFO] [GpuLayer] average_pooling2d_1/AvgPool
[INFO] [GpuLayer] attention_2/convolution
[INFO] [GpuLayer] motion_conv3/BiasAdd
[INFO] [GpuLayer] PWN(motion_conv3_act/Tanh)
[INFO] [GpuLayer] motion_conv4/BiasAdd
[INFO] [GpuLayer] PWN(PWN(motion_conv4_act/Tanh), PWN(PWN(attention_2_act/Sigmoid), multiply_2/mul))
[INFO] [GpuLayer] average_pooling2d_3/AvgPool
[INFO] [GpuLayer] flatten_1/Reshape + (Unnamed Layer* 46) [Shuffle]
[INFO] [GpuLayer] dense_1/MatMul + dense_1/bias/read__28 + (Unnamed Layer* 53) [Shuffle] + unsqueeze_node_after_dense_1/bias/read__28 + (Unnamed Layer* 53) [Shuffle]_(Unnamed Layer* 53) [Shuffle]_output + dense_1/BiasAdd
[INFO] [GpuLayer] copied_squeeze_after_dense_1/BiasAdd + lambda_1/Squeeze
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +28, now: CPU 436, GPU 3890 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +241, GPU -108, now: CPU 677, GPU 3782 (MiB)
[INFO] Local timing cache in use. Profiling results in this builder pass will not be stored.
[INFO] Detected 2 inputs and 1 output network tensors.
[INFO] Total Host Persistent Memory: 18528
[INFO] Total Device Persistent Memory: 818688
[INFO] Total Scratch Memory: 0
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 144 MiB
[INFO] [BlockAssignment] Algorithm ShiftNTopDown took 1.68081ms to assign 4 blocks to 26 nodes requiring 1918976 bytes.
[INFO] Total Activation Memory: 1918976
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +4, now: CPU 930, GPU 3292 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 930, GPU 3292 (MiB)
[INFO] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +4, now: CPU 0, GPU 4 (MiB)

Thank you so much for all the help you have offered.

P.S. I need to add that, there was something quirky about the conversion. As you might notice from the command i have used, i had to add an extra dimension to input shapes of motion_input:0. Since, when i run the code, it tends to ignore the first layer in the shapes of motion_input:0.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.