TLT-converter for HeartRateNet model error

Hello. I’m trying to convert HeartRateNet deployable(.etlt). After running this command (./tlt-converter -k nvidia_tlt -p input, 1x3x72x72, 2x3x72x72, 4x3x72x72 model.etlt -t fp16 -e model_onnx_b16) it gives me this:[Warning] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[ERROR] Number of optimization profile does not match model input node number.

Can anybody tell me what i’m doing wrong and what should i do to fix it ?

You can download TLT CV Inference Pipeline Quick Start script via

ngc registry resource download-version “nvidia/tlt_cv_inference_pipeline_quick_start:v0.2-ga”

https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/tlt_cv_inf_pipeline/requirements_and_installation.html#download-the-tlt-cv-inference-pipeline-quick-start

Then, in scripts/tlt_cv_compile.sh, you can refer to below to generate trt engine.

        tlt-converter -k ${ENCODING_KEY} -t fp16 \
            -p appearance_input:0,1x3x72x72,1x3x72x72,2x3x72x72 \
            -p motion_input:0,1x3x72x72,1x3x72x72,2x3x72x72 \
            -e ${repo_location}/heartrate_two_branch_tlt/1/model.plan \
            /models/tlt_heartratenet_v${tlt_model_version_heartrate}/model.etlt

Hello. When i’m trying to run this command - bash tlt_cv_compile.sh -m heartrate, it gives me this error:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown. i’m doing this on Jetson AGX Xavier, sorry for not mentioning that earlier.

Could you run below command to generate trt engine(i.e. model.plan) directly?

    tlt-converter -k ${ENCODING_KEY} -t fp16 \
        -p appearance_input:0,1x3x72x72,1x3x72x72,2x3x72x72 \
        -p motion_input:0,1x3x72x72,1x3x72x72,2x3x72x72 \
        -e ${repo_location}/heartrate_two_branch_tlt/1/model.plan \
        /models/tlt_heartratenet_v${tlt_model_version_heartrate}/model.etlt

For your mentioned error, please check software requirements in Requirements and Installation — Transfer Learning Toolkit 3.0 documentation

If i run it directly, it gives me this: Please provide three optimization profiles via -p <input_name>,<min_shape>,<opt_shape>,<max_shape>, where each shape has the format: nxcxhxw
Aborted (core dumped)

Can you share the command you have run? Please share the logs too.

./tlt-converter -k nvidia_tlt -t fp16
-p appearance_input: 0, 1x3x72x72, 1x3x72x72, 2x3x72x72
-p motion_input: 0, 1x3x72x72, 1x3x72x72, 2x3x72x72
-e model/model.etlt

The “-e” is the trt engine you want to generate. It is not the etlt model. Please check the usage via “$ tlt-converter -h”.
Please try below command.
./tlt-converter -k nvidia_tlt -t fp16 -p appearance_input:0,1x3x72x72,1x3x72x72,2x3x72x72 -p motion_input:0,1x3x72x72,1x3x72x72,2x3x72x72 -e ./model.plan model/model.etlt

Segmentation fault (core dumped)

Can you share the full log?
And where did you get the tlt-converter?

I installed tlt-converter with this command wget https://developer.nvidia.com/tlt-converter-trt71. And how do i share logs ?

I mean the full log when you get “Segmentation fault (core dumped)”. Please share the command again .

./tlt-converter -k nvidia_tlt -d 3,72,72 -t fp16 -p appearance_input:0,1x3x72x72,1x3x72x72,2x3x72x72 -p motion_input:0,1x3x72x72,1x3x72x72,2x3x72x72 -e ./model.plan model/model.etlt

Can you remove “-d 3,72,72”?

If i do that it says: No input dimensions given

Seems that you have not downloaded the correct version of tlt-converter for Xavier.
See TensorRT — Transfer Learning Toolkit 3.0 documentation
For Jetson devices, please download it from
https://developer.nvidia.com/cuda102-trt71-jp44
or
https://developer.nvidia.com/cuda102-trt71-jp45

I downloaded the right version, run the command, the same error

Please share your latest command and log.

I download the official model in heartratenet model card NVIDIA NGC. There is no issue.
See below log.

$ wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_heartratenet/versions/deployable_v1.0/files/model.etlt

$ ./tlt-converter -k nvidia_tlt -t fp16 -p appearance_input:0,1x3x72x72,1x3x72x72,2x3x72x72 -p motion_input:0,1x3x72x72,1x3x72x72,2x3x72x72 -e ./model.plan ./model.etlt
[INFO] [MemUsageChange] Init CUDA: CPU +356, GPU +0, now: CPU 374, GPU 6938 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename: /tmp/fileUQNEOa
[INFO] ONNX IR version: 0.0.5
[INFO] Opset version: 10
[INFO] Producer name: tf2onnx
[INFO] Producer version: 1.6.3
[INFO] Domain:
[INFO] Model version: 0
[INFO] Doc string:
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:382: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[INFO] Detected input dimensions from the model: (-1, 3, 72, 72)
[INFO] Detected input dimensions from the model: (-1, 3, 72, 72)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 72, 72) for input: appearance_input:0
[INFO] Using optimization profile opt shape: (1, 3, 72, 72) for input: appearance_input:0
[INFO] Using optimization profile max shape: (2, 3, 72, 72) for input: appearance_input:0
[INFO] Using optimization profile min shape: (1, 3, 72, 72) for input: motion_input:0
[INFO] Using optimization profile opt shape: (1, 3, 72, 72) for input: motion_input:0
[INFO] Using optimization profile max shape: (2, 3, 72, 72) for input: motion_input:0
[INFO] [MemUsageSnapshot] Builder begin: CPU 389 MiB, GPU 6968 MiB
[WARNING] DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU
[INFO] ---------- Layers Running on DLA ----------
[INFO] ---------- Layers Running on GPU ----------
[INFO] [GpuLayer] appearance_conv1/BiasAdd
[INFO] [GpuLayer] motion_conv1/BiasAdd
[INFO] [GpuLayer] PWN(appearance_conv1_act/Tanh)
[INFO] [GpuLayer] PWN(motion_conv1_act/Tanh)
[INFO] [GpuLayer] motion_conv2/BiasAdd
[INFO] [GpuLayer] appearance_conv2/BiasAdd
[INFO] [GpuLayer] PWN(appearance_conv2_act/Tanh)
[INFO] [GpuLayer] attention_1/convolution
[INFO] [GpuLayer] PWN(PWN(attention_1_act/Sigmoid), PWN(PWN(motion_conv2_act/Tanh), multiply_1/mul))
[INFO] [GpuLayer] average_pooling2d_1/AvgPool
[INFO] [GpuLayer] motion_conv3/BiasAdd
[INFO] [GpuLayer] average_pooling2d_2/AvgPool
[INFO] [GpuLayer] appearance_conv3/BiasAdd
[INFO] [GpuLayer] PWN(motion_conv3_act/Tanh)
[INFO] [GpuLayer] PWN(appearance_conv3_act/Tanh)
[INFO] [GpuLayer] appearance_conv4/BiasAdd
[INFO] [GpuLayer] motion_conv4/BiasAdd
[INFO] [GpuLayer] PWN(appearance_conv4_act/Tanh)
[INFO] [GpuLayer] attention_2/convolution
[INFO] [GpuLayer] PWN(PWN(motion_conv4_act/Tanh), PWN(PWN(attention_2_act/Sigmoid), multiply_2/mul))
[INFO] [GpuLayer] average_pooling2d_3/AvgPool
[INFO] [GpuLayer] flatten_1/Reshape + (Unnamed Layer* 46) [Shuffle]
[INFO] [GpuLayer] dense_1/MatMul
[INFO] [GpuLayer] dense_1/bias:0 + (Unnamed Layer* 53) [Shuffle] + unsqueeze_node_after_dense_1/bias:0 + (Unnamed Layer* 53) [Shuffle] + dense_1/BiasAdd + dense_1/Tanh
[INFO] [GpuLayer] dense_2/MatMul + dense_2/bias:0 + (Unnamed Layer* 69) [Shuffle] + unsqueeze_node_after_dense_2/bias:0 + (Unnamed Layer* 69) [Shuffle] + dense_2/BiasAdd
[INFO] [GpuLayer] copied_squeeze_after_dense_2/BiasAdd + lambda_1/Squeeze
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +227, GPU +285, now: CPU 616, GPU 7253 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +307, GPU +393, now: CPU 923, GPU 7646 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 2 inputs and 1 output network tensors.
[INFO] Total Host Persistent Memory: 30336
[INFO] Total Device Persistent Memory: 487936
[INFO] Total Scratch Memory: 0
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 10 MiB, GPU 250 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +5, now: CPU 1422, GPU 8338 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1422, GPU 8346 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1422, GPU 8333 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1422, GPU 8313 (MiB)
[INFO] [MemUsageSnapshot] Builder end: CPU 1421 MiB, GPU 8313 MiB

Thank you for your help !