GazeNet - Tao_converter [ERROR] input_left_images:0: number of dimensions is 4 but profile 0 has 3

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
Jetson Xavier
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
GazeNet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
8.2.1
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Using the deepstream_tao_apps models and tao_converter, I’m trying to generate the model engine file manually.

Start command:

sudo ./tao-converter -e /opt/nvidia/deepstream/deepstream/sources/deepstream_tao_apps/models/gazenet/gazenet_facegrid.etlt_b8_gpu0_fp16.engine \
-p input_right_images:0,1x1x224x224,8x1x224x224,8x1x224x224 \
-p input_facegrid:0,1x1x625x1,8x1x625x1,8x1x625x1 \
-p input_face_images:0,1x1x224x224,8x1x224x224,8x1x224x224 \
-p input_left_images:0,1x1x224x224,8x1x224x224,8x1x224x224 \
-t fp16 -k nvidia_tlt -m 8 nvidia_tlt /opt/nvidia/deepstream/deepstream/sources/deepstream_tao_apps/models/gazenet/gazenet_facegrid.etlt

output:

[INFO] [MemUsageChange] Init CUDA: CPU +363, GPU +0, now: CPU 381, GPU 14483 (MiB)
[INFO] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 381 MiB, GPU 14484 MiB
[INFO] [MemUsageSnapshot] End constructing builder kernel library: CPU 486 MiB, GPU 14605 MiB
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/filetyyxGm
[INFO] ONNX IR version:  0.0.5
[INFO] Opset version:    10
[INFO] Producer name:    tf2onnx
[INFO] Producer version: 1.6.3
[INFO] Domain:           
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] ShapedWeights.cpp:173: Weights fg_fc1_dense/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[WARNING] ShapedWeights.cpp:173: Weights fg_fc2_dense/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[WARNING] ShapedWeights.cpp:173: Weights fc1_e1_dense/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[WARNING] ShapedWeights.cpp:173: Weights fc1_f_dense/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[WARNING] ShapedWeights.cpp:173: Weights fc2_f_dense/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[WARNING] ShapedWeights.cpp:173: Weights xyz_fc_top_dense/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[WARNING] ShapedWeights.cpp:173: Weights xyz_fc/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[WARNING] ShapedWeights.cpp:173: Weights tp_fc_top_dense/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[WARNING] ShapedWeights.cpp:173: Weights tp_fc/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[INFO] Detected input dimensions from the model: (-1, 1, 224, 224)
[INFO] Detected input dimensions from the model: (-1, 1, 625, 1)
[INFO] Detected input dimensions from the model: (-1, 1, 224, 224)
[INFO] Detected input dimensions from the model: (-1, 1, 224, 224)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 1, 224, 224) for input: input_right_images:0
[INFO] Using optimization profile opt shape: (8, 1, 224, 224) for input: input_right_images:0
[INFO] Using optimization profile max shape: (8, 1, 224, 224) for input: input_right_images:0
[INFO] Using optimization profile min shape: (1, 625, 1) for input: input_facegrid:0
[INFO] Using optimization profile opt shape: (1, 625, 1) for input: input_facegrid:0
[INFO] Using optimization profile max shape: (1, 625, 1) for input: input_facegrid:0
[INFO] Using optimization profile min shape: (1, 224, 224) for input: input_face_images:0
[INFO] Using optimization profile opt shape: (1, 224, 224) for input: input_face_images:0
[INFO] Using optimization profile max shape: (1, 224, 224) for input: input_face_images:0
[INFO] Using optimization profile min shape: (1, 224, 224) for input: input_left_images:0
[INFO] Using optimization profile opt shape: (1, 224, 224) for input: input_left_images:0
[INFO] Using optimization profile max shape: (1, 224, 224) for input: input_left_images:0
[WARNING] DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU
[ERROR] 4: [network.cpp::validate::2951] Error Code 4: Internal Error (input_left_images:0: number of dimensions is 4 but profile 0 has 3.)
[ERROR] Unable to create engine
Segmentation fault

input file:
gazenet_facegrid.etlt (17.3 MB)

Can you double check?
I run below command, there is no issue.

./tao-converter -e out.engine -k nvidia_tlt -p input_right_images:0,1x1x224x224,4x1x224x224,8x1x224x224 -p input_left_images:0,1x1x224x224,4x1x224x224,8x1x224x224 -p input_face_images:0,1x1x224x224,4x1x224x224,8x1x224x224 -p input_facegrid:0,1x1x625x1,4x1x625x1,8x1x625x1 gazenet_facegrid.etlt
[INFO] [MemUsageChange] Init CUDA: CPU +328, GPU +0, now: CPU 340, GPU 739 (MiB)
[INFO] [MemUsageChange] Init builder kernel library: CPU +442, GPU +116, now: CPU 837, GPU 855 (MiB)
[WARNING] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in CUDA C++ Programming Guide
[INFO] ----------------------------------------------------------------
[INFO] Input filename: /tmp/fileJ06Iqu
[INFO] ONNX IR version: 0.0.5
[INFO] Opset version: 10
[INFO] Producer name: tf2onnx
[INFO] Producer version: 1.6.3
[INFO] Domain:
[INFO] Model version: 0
[INFO] Doc string:
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[INFO] Detected input dimensions from the model: (-1, 1, 224, 224)
[INFO] Detected input dimensions from the model: (-1, 1, 224, 224)
[INFO] Detected input dimensions from the model: (-1, 1, 224, 224)
[INFO] Detected input dimensions from the model: (-1, 1, 625, 1)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 1, 224, 224) for input: input_right_images:0
[INFO] Using optimization profile opt shape: (4, 1, 224, 224) for input: input_right_images:0
[INFO] Using optimization profile max shape: (8, 1, 224, 224) for input: input_right_images:0
[INFO] Using optimization profile min shape: (1, 1, 224, 224) for input: input_left_images:0
[INFO] Using optimization profile opt shape: (4, 1, 224, 224) for input: input_left_images:0
[INFO] Using optimization profile max shape: (8, 1, 224, 224) for input: input_left_images:0
[INFO] Using optimization profile min shape: (1, 1, 224, 224) for input: input_face_images:0
[INFO] Using optimization profile opt shape: (4, 1, 224, 224) for input: input_face_images:0
[INFO] Using optimization profile max shape: (8, 1, 224, 224) for input: input_face_images:0
[INFO] Using optimization profile min shape: (1, 1, 625, 1) for input: input_facegrid:0
[INFO] Using optimization profile opt shape: (4, 1, 625, 1) for input: input_facegrid:0
[INFO] Using optimization profile max shape: (8, 1, 625, 1) for input: input_facegrid:0
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +854, GPU +362, now: CPU 1715, GPU 1217 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +126, GPU +58, now: CPU 1841, GPU 1275 (MiB)
[INFO] Local timing cache in use. Profiling results in this builder pass will not be stored.
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.
[INFO] Total Activation Memory: 1199532032
[INFO] Detected 4 inputs and 1 output network tensors.
[INFO] Total Host Persistent Memory: 88160
[INFO] Total Device Persistent Memory: 56832
[INFO] Total Scratch Memory: 8577536
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 8 MiB, GPU 439 MiB
[INFO] [BlockAssignment] Started assigning block shifts. This will take 117 steps to complete.
[INFO] [BlockAssignment] Algorithm ShiftNTopDown took 18.5063ms to assign 30 blocks to 117 nodes requiring 47802368 bytes.
[INFO] Total Activation Memory: 47802368
[INFO] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 2411, GPU 1543 (MiB)
[INFO] [MemUsageChange] TensorRT-managed allocation in building engine:

I can also run below command successfully.
./tao-converter -e out.engine -k nvidia_tlt -p input_right_images:0,1x1x224x224,8x1x224x224,8x1x224x224 -p input_left_images:0,1x1x224x224,8x1x224x224,8x1x224x224 -p input_face_images:0,1x1x224x224,8x1x224x224,8x1x224x224 -p input_facegrid:0,1x1x625x1,8x1x625x1,8x1x625x1 gazenet_facegrid.etlt

I copied the .etlt file to the tao-converter directory and ran your command. Still got he same error.

Here’s a photo of my terminal window:

For some reason the batch dimension from my input layers is removed after ‘input_right_images:0’ layer’s optimization profile.

Can you try this tao-converter?
wget --content-disposition 'https://api.ngc.nvidia.com/v2/resources/nvidia/tao/tao-converter/versions/v4.0.0_trt8.5.2.2_aarch64/files/tao-converter'

From TAO Converter | NVIDIA NGC

This worked.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.