Thank you for your kind advice.
After running this command for heartratenet:
home/User/tcnv/tao-converter -k nvidia_tlt -t fp16 -p appearance_input:0,1x3x72x72,1x3x72x72,2x3x72x72 -p motion_input:0,1x1x3x72x72,1x1x3x72x72,1x2x3x72x72 -e heartrate.etlt_b16_gpu0_fp16.engine heartrate.etlt
I managed to create the engine and this is the output i got:
[INFO] [MemUsageChange] Init CUDA: CPU +230, GPU +0, now: CPU 248, GPU 3831 (MiB)
[INFO] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 248 MiB, GPU 3831 MiB
[INFO] [MemUsageSnapshot] End constructing builder kernel library: CPU 277 MiB, GPU 3860 MiB
[INFO] ----------------------------------------------------------------
[INFO] Input filename: /tmp/fileqTkTeF
[INFO] ONNX IR version: 0.0.5
[INFO] Opset version: 10
[INFO] Producer name: tf2onnx
[INFO] Producer version: 1.8.4
[INFO] Domain:
[INFO] Model version: 0
[INFO] Doc string:
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] ShapedWeights.cpp:173: Weights dense_1/kernel/read__27 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[INFO] Detected input dimensions from the model: (-1, 3, 72, 72)
[INFO] Detected input dimensions from the model: (-1, 3, 72, 72)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 72, 72) for input: appearance_input:0
[INFO] Using optimization profile opt shape: (1, 3, 72, 72) for input: appearance_input:0
[INFO] Using optimization profile max shape: (2, 3, 72, 72) for input: appearance_input:0
[INFO] Using optimization profile min shape: (1, 3, 72, 72) for input: motion_input:0
[INFO] Using optimization profile opt shape: (1, 3, 72, 72) for input: motion_input:0
[INFO] Using optimization profile max shape: (2, 3, 72, 72) for input: motion_input:0
[WARNING] DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU
[INFO] ---------- Layers Running on DLA ----------
[INFO] ---------- Layers Running on GPU ----------
[INFO] [GpuLayer] motion_conv1/BiasAdd
[INFO] [GpuLayer] appearance_conv1/BiasAdd
[INFO] [GpuLayer] PWN(motion_conv1_act/Tanh)
[INFO] [GpuLayer] PWN(appearance_conv1_act/Tanh)
[INFO] [GpuLayer] motion_conv2/BiasAdd
[INFO] [GpuLayer] appearance_conv2/BiasAdd
[INFO] [GpuLayer] PWN(appearance_conv2_act/Tanh)
[INFO] [GpuLayer] average_pooling2d_2/AvgPool
[INFO] [GpuLayer] attention_1/convolution
[INFO] [GpuLayer] appearance_conv3/BiasAdd
[INFO] [GpuLayer] PWN(appearance_conv3_act/Tanh)
[INFO] [GpuLayer] appearance_conv4/BiasAdd
[INFO] [GpuLayer] PWN(PWN(attention_1_act/Sigmoid), PWN(PWN(motion_conv2_act/Tanh), multiply_1/mul))
[INFO] [GpuLayer] PWN(appearance_conv4_act/Tanh)
[INFO] [GpuLayer] average_pooling2d_1/AvgPool
[INFO] [GpuLayer] attention_2/convolution
[INFO] [GpuLayer] motion_conv3/BiasAdd
[INFO] [GpuLayer] PWN(motion_conv3_act/Tanh)
[INFO] [GpuLayer] motion_conv4/BiasAdd
[INFO] [GpuLayer] PWN(PWN(motion_conv4_act/Tanh), PWN(PWN(attention_2_act/Sigmoid), multiply_2/mul))
[INFO] [GpuLayer] average_pooling2d_3/AvgPool
[INFO] [GpuLayer] flatten_1/Reshape + (Unnamed Layer* 46) [Shuffle]
[INFO] [GpuLayer] dense_1/MatMul + dense_1/bias/read__28 + (Unnamed Layer* 53) [Shuffle] + unsqueeze_node_after_dense_1/bias/read__28 + (Unnamed Layer* 53) [Shuffle]_(Unnamed Layer* 53) [Shuffle]_output + dense_1/BiasAdd
[INFO] [GpuLayer] copied_squeeze_after_dense_1/BiasAdd + lambda_1/Squeeze
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +28, now: CPU 436, GPU 3890 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +241, GPU -108, now: CPU 677, GPU 3782 (MiB)
[INFO] Local timing cache in use. Profiling results in this builder pass will not be stored.
[INFO] Detected 2 inputs and 1 output network tensors.
[INFO] Total Host Persistent Memory: 18528
[INFO] Total Device Persistent Memory: 818688
[INFO] Total Scratch Memory: 0
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 144 MiB
[INFO] [BlockAssignment] Algorithm ShiftNTopDown took 1.68081ms to assign 4 blocks to 26 nodes requiring 1918976 bytes.
[INFO] Total Activation Memory: 1918976
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +4, now: CPU 930, GPU 3292 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 930, GPU 3292 (MiB)
[INFO] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +4, now: CPU 0, GPU 4 (MiB)
Thank you so much for all the help you have offered.
P.S. I need to add that, there was something quirky about the conversion. As you might notice from the command i have used, i had to add an extra dimension to input shapes of motion_input:0
. Since, when i run the code, it tends to ignore the first layer in the shapes of motion_input:0
.