$ wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_heartratenet/versions/deployable_v1.0/files/model.etlt
$ ./tlt-converter -k nvidia_tlt -t fp16 -p appearance_input:0,1x3x72x72,1x3x72x72,2x3x72x72 -p motion_input:0,1x3x72x72,1x3x72x72,2x3x72x72 -e ./model.plan ./model.etlt
[INFO] [MemUsageChange] Init CUDA: CPU +356, GPU +0, now: CPU 374, GPU 6938 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename: /tmp/fileUQNEOa
[INFO] ONNX IR version: 0.0.5
[INFO] Opset version: 10
[INFO] Producer name: tf2onnx
[INFO] Producer version: 1.6.3
[INFO] Domain:
[INFO] Model version: 0
[INFO] Doc string:
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:382: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[INFO] Detected input dimensions from the model: (-1, 3, 72, 72)
[INFO] Detected input dimensions from the model: (-1, 3, 72, 72)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 72, 72) for input: appearance_input:0
[INFO] Using optimization profile opt shape: (1, 3, 72, 72) for input: appearance_input:0
[INFO] Using optimization profile max shape: (2, 3, 72, 72) for input: appearance_input:0
[INFO] Using optimization profile min shape: (1, 3, 72, 72) for input: motion_input:0
[INFO] Using optimization profile opt shape: (1, 3, 72, 72) for input: motion_input:0
[INFO] Using optimization profile max shape: (2, 3, 72, 72) for input: motion_input:0
[INFO] [MemUsageSnapshot] Builder begin: CPU 389 MiB, GPU 6968 MiB
[WARNING] DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU
[INFO] ---------- Layers Running on DLA ----------
[INFO] ---------- Layers Running on GPU ----------
[INFO] [GpuLayer] appearance_conv1/BiasAdd
[INFO] [GpuLayer] motion_conv1/BiasAdd
[INFO] [GpuLayer] PWN(appearance_conv1_act/Tanh)
[INFO] [GpuLayer] PWN(motion_conv1_act/Tanh)
[INFO] [GpuLayer] motion_conv2/BiasAdd
[INFO] [GpuLayer] appearance_conv2/BiasAdd
[INFO] [GpuLayer] PWN(appearance_conv2_act/Tanh)
[INFO] [GpuLayer] attention_1/convolution
[INFO] [GpuLayer] PWN(PWN(attention_1_act/Sigmoid), PWN(PWN(motion_conv2_act/Tanh), multiply_1/mul))
[INFO] [GpuLayer] average_pooling2d_1/AvgPool
[INFO] [GpuLayer] motion_conv3/BiasAdd
[INFO] [GpuLayer] average_pooling2d_2/AvgPool
[INFO] [GpuLayer] appearance_conv3/BiasAdd
[INFO] [GpuLayer] PWN(motion_conv3_act/Tanh)
[INFO] [GpuLayer] PWN(appearance_conv3_act/Tanh)
[INFO] [GpuLayer] appearance_conv4/BiasAdd
[INFO] [GpuLayer] motion_conv4/BiasAdd
[INFO] [GpuLayer] PWN(appearance_conv4_act/Tanh)
[INFO] [GpuLayer] attention_2/convolution
[INFO] [GpuLayer] PWN(PWN(motion_conv4_act/Tanh), PWN(PWN(attention_2_act/Sigmoid), multiply_2/mul))
[INFO] [GpuLayer] average_pooling2d_3/AvgPool
[INFO] [GpuLayer] flatten_1/Reshape + (Unnamed Layer* 46) [Shuffle]
[INFO] [GpuLayer] dense_1/MatMul
[INFO] [GpuLayer] dense_1/bias:0 + (Unnamed Layer* 53) [Shuffle] + unsqueeze_node_after_dense_1/bias:0 + (Unnamed Layer* 53) [Shuffle] + dense_1/BiasAdd + dense_1/Tanh
[INFO] [GpuLayer] dense_2/MatMul + dense_2/bias:0 + (Unnamed Layer* 69) [Shuffle] + unsqueeze_node_after_dense_2/bias:0 + (Unnamed Layer* 69) [Shuffle] + dense_2/BiasAdd
[INFO] [GpuLayer] copied_squeeze_after_dense_2/BiasAdd + lambda_1/Squeeze
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +227, GPU +285, now: CPU 616, GPU 7253 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +307, GPU +393, now: CPU 923, GPU 7646 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 2 inputs and 1 output network tensors.
[INFO] Total Host Persistent Memory: 30336
[INFO] Total Device Persistent Memory: 487936
[INFO] Total Scratch Memory: 0
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 10 MiB, GPU 250 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +5, now: CPU 1422, GPU 8338 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1422, GPU 8346 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1422, GPU 8333 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1422, GPU 8313 (MiB)
[INFO] [MemUsageSnapshot] Builder end: CPU 1421 MiB, GPU 8313 MiB