Convert model to Jetson Error during model export step in TAO notebook

Hey All,
Im having some problem with step #10 model export in the TAO notebook.
the instruction explains that For the jetson devices, please download the tao-converter for jetson from the dev zone link here.
so I downloaded the “Jetson” converter, and pulled the tlt and .bin files which been trained on the server, and follow the README instructions, so far so good, but when execute the ./tao-converter -h output the following msg:
./tao-converter: error while loading shared libraries: libnvinfer.so.7: cannot open shared object file: No such file or directory

side note:

ii  libnvidia-container0:arm64       0.10.0+jetpack                             arm64        NVIDIA container runtime library
ii  nvidia-container-csv-cuda        10.2.460-1                                 arm64        Jetpack CUDA CSV file
ii  nvidia-container-csv-cudnn       8.2.1.32-1+cuda10.2                        arm64        Jetpack CUDNN CSV file
ii  nvidia-container-csv-tensorrt    8.0.1.6-1+cuda10.2                         arm64        Jetpack TensorRT CSV file

Did you download the correct version of tao-converter?
See more info in TensorRT — TAO Toolkit 3.21.11 documentation

Hey ,
the default download form the link at TAO converter gets the ‘clara_agx’ converter so i follow your suggestion and downloaded the ‘tao-converter-jp46’ version directly form the link you gave.
( for the sake of clarity , the model was trained on the server, and I imported the .etlt and.bin files to the jetson and I used the ‘converter’)

But its still didnt work, and the here is the log:

$./tao-converter -k $KEY -p Input,1x3x480x640,8x3x480x640,16x3x480x640 -c export/cal.bin -e export/trt.engine -b 8 -t int8 export/yolov4_cspdarknet_tiny_epoch_080.etlt

[INFO] [MemUsageChange] Init CUDA: CPU +354, GPU +0, now: CPU 372, GPU 7687 (MiB)
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:1: Interpreting non ascii codepoint 224.
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:1: Expected identifier, got: �
[ERROR] ModelImporter.cpp:682: Failed to parse ONNX model from file: /tmp/fileLEriKK
[ERROR] Failed to parse the model, please check the encoding key to make sure it's correct
[INFO] Model has no dynamic shape.
[ERROR] 4: [network.cpp::validate::2411] Error Code 4: Internal Error (Network must have at least one output)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

the files in the export folder :

$ls export/
cal.bin  yolov4_cspdarknet_tiny_epoch_080.etlt

what could be the problem?

Is yolov4_cspdarknet_tiny_epoch_080.etlt trained by yourself?

Can you download an official yolov4 model and retry?

wget https://nvidia.box.com/shared/static/511552h6b1ecw4gd20ptuihoiidz13cs -O models.zip

Its input size is 960x544

i downloaded the model from the nvidia repo

!ngc registry model download-version nvidia/tao/pretrained_object_detection:cspdarknet_tiny \
                   --dest $LOCAL_EXPERIMENT_DIR/pretrained_cspdarknet_tiny

and trained it on a custom data. I need to use the tiny model since I need to use less computational power.

BTW executing the same script on the server, works fine and creating a trt.engine file

!tao converter -k $KEY  \
                  -p Input,1x3x480x640,8x3x480x640,16x3x480x640 \
                  -c $USER_EXPERIMENT_DIR/export/cal.bin \
                  -e $USER_EXPERIMENT_DIR/export/trt.engine \
                  -b 8 \
                  -t int8 \
                  $USER_EXPERIMENT_DIR/export/yolov4_cspdarknet_tiny_epoch_080.etlt

output:

2022-01-24 14:31:11,579 [INFO] root: Registry: ['nvcr.io']
2022-01-24 14:31:11,657 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-24 14:31:11,673 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
[INFO] [MemUsageChange] Init CUDA: CPU +252, GPU +0, now: CPU 258, GPU 487 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/fileCxM6sl
[INFO] ONNX IR version:  0.0.7
[INFO] Opset version:    13
[INFO] Producer name:    
[INFO] Producer version: 
[INFO] Domain:           
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] /trt_oss_src/TensorRT/parsers/onnx/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[INFO] No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[INFO] Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace: 
[INFO] Successfully created plugin: BatchedNMSDynamic_TRT
[INFO] Detected input dimensions from the model: (-1, 3, 480, 640)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 480, 640) for input: Input
[INFO] Using optimization profile opt shape: (8, 3, 480, 640) for input: Input
[INFO] Using optimization profile max shape: (16, 3, 480, 640) for input: Input
[INFO] [MemUsageSnapshot] Builder begin: CPU 280 MiB, GPU 487 MiB
[INFO] Reading Calibration Cache for calibrator: EntropyCalibration2
[INFO] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[INFO] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.

UPDATE: I generated again the export step at the server, with the following command

!tao yolo_v4_tiny export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_cspdarknet_tiny_epoch_$EPOCH.tlt  \
                   -o $USER_EXPERIMENT_DIR/export/yolov4_cspdarknet_tiny_epoch_$EPOCH.etlt \
                   -e $SPECS_DIR/yolo_v4_tiny_retrain_chimera.txt \
                   -k $KEY \
                   --data_type int8 \
                   --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin

and ran again the the converter step with the new files on the nx and got the next error:

./tao-converter -k $KEY -p Input,1x3x480x640,8x3x480x640,16x3x480x640 -c export/cal.bin -e export/trt.engine -b 8 -t int8 export/yolov4_cspdarknet_tiny_epoch_080.etlt
[INFO] [MemUsageChange] Init CUDA: CPU +353, GPU +0, now: CPU 371, GPU 7054 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/fileP2LjEh
[INFO] ONNX IR version:  0.0.0
[INFO] Opset version:    0
[INFO] Producer name:    
[INFO] Producer version: 
[INFO] Domain:           
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[INFO] Model has no dynamic shape.
[ERROR] 4: [network.cpp::validate::2411] Error Code 4: Internal Error (Network must have at least one output)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

how should I continue from here?

Please check if you download the correct version of tao-converter in your NX based on the Jetpack version.

Hey, thanks for the reply,
I already check that in earlier messages , I downloaded the version ‘tao-converter-jp46-trt8.0.1.6’ since im using Jetpack4.6

Can you try again without “-p” option ?

same response:

./tao-converter -k $KEY -c export/cal.bin -e export/trt.engine -b 8 -t int8 export/yolov4_cspdarknet_tiny_epoch_080.etlt

[INFO] [MemUsageChange] Init CUDA: CPU +353, GPU +0, now: CPU 371, GPU 7660 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/fileThnSk2
[INFO] ONNX IR version:  0.0.0
[INFO] Opset version:    0
[INFO] Producer name:    
[INFO] Producer version: 
[INFO] Domain:           
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[INFO] Model has no dynamic shape.
[ERROR] 4: [network.cpp::validate::2411] Error Code 4: Internal Error (Network must have at least one output)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

Can you try the official released yolov4 model?
wget https://nvidia.box.com/shared/static/511552h6b1ecw4gd20ptuihoiidz13cs -O models.zip

thanks @Morganh @AakankshaS I’ll try to use yolo4,

so Im having the problem with the [ERROR] 4: [network.cpp::validate::2411] Error Code 4: Internal Error (Network must have at least one output)

I read the instruction of tao converter, and its highlights that the required are -d and -o which are not exist at the converter step in the notebook
Using the tao-converter

tao-converter [-h] -k <encryption_key>
              -d <input_dimensions>
              -o <comma separated output nodes>

where :

-d: A comma-separated list of input dimensions that should match the dimensions used for tao yolo_v4_tiny export.

-o: A comma-separated list of output blob names that should match the output configuration used for tao yolo_v4_tiny export. For YOLOv4-tiny, set this argument to BatchedNMS.

are they mandatory? if so, could you give please example for those parameters, because they are not part of the export command in counter to what written match the dimensions used for tao yolo_v4_tiny export

For example,
$tao-converter -k $KEY
-p Input,1x3x384x1248,1x3x384x1248,1x3x384x1248
-o BatchedNMS
-e trt_yolo_resnet18.fp32.engine
-t fp32
-i nchw
-m 1
-w 100000000
yolov4_resnet18_epoch_065.etlt

Refer to Tao-converter convert yolo-v4.etlt to trt.engine error:no optimization profile has been defined - #4 by Morganh

so the -d in not required?
same error appears also with the -o option

./tao-converter -k $KEY -p Input,1x3x480x640,8x3x480x640,16x3x480x640 -o BatchedNMS -e export/trt.engine -t int8 export/yolov4_cspdarknet_tiny_epoch_080.etlt 

[INFO] [MemUsageChange] Init CUDA: CPU +353, GPU +0, now: CPU 371, GPU 7036 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/filenap2Rr
[INFO] ONNX IR version:  0.0.0
[INFO] Opset version:    0
[INFO] Producer name:    
[INFO] Producer version: 
[INFO] Domain:           
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[INFO] Model has no dynamic shape.
[ERROR] 4: [network.cpp::validate::2411] Error Code 4: Internal Error (Network must have at least one output)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

BTW is that a problem that ONNX version is 0.0.0 and all other version are 0 also?
and did you know if someone succeed to use tao_converter with yolov4_tiny on jetson NX?

I will check if I can run it successfully.

Can you use the explicit key instead of $KEY ?

Great input! this made the process run! but we are not done yet, because the process ended in error :
executing the tao-converter with the -t int8 argument ( as shown in the yolov4_tiny) gives the follow error output:

./tao-converter -k ... -p Input,1x3x480x640,8x3x480x640,16x3x480x640 -e export/trt.engine -b 8 **-t int8** export/yolov4_cspdarknet_tiny_epoch_080.etlt

[INFO] [MemUsageChange] Init CUDA: CPU +353, GPU +0, now: CPU 371, GPU 6476 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/fileCnwAkI
[INFO] ONNX IR version:  0.0.7
[INFO] Opset version:    13
[INFO] Producer name:    
[INFO] Producer version: 
[INFO] Domain:           
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[INFO] No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[INFO] Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace: 
[INFO] Successfully created plugin: BatchedNMSDynamic_TRT
[INFO] Detected input dimensions from the model: (-1, 3, 480, 640)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 480, 640) for input: Input
[INFO] Using optimization profile opt shape: (8, 3, 480, 640) for input: Input
[INFO] Using optimization profile max shape: (16, 3, 480, 640) for input: Input
**[WARNING] DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU**
[INFO] [MemUsageSnapshot] Builder begin: CPU 395 MiB, GPU 6546 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +226, GPU +188, now: CPU 622, GPU 6734 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +307, GPU +313, now: CPU 929, GPU 7047 (MiB)
**[WARNING] Calibration Profile is not defined. Running calibration with Profile 0**
[INFO] Detected 1 inputs and 4 output network tensors.
[INFO] Total Host Persistent Memory: 6896
[INFO] Total Device Persistent Memory: 0
[INFO] Total Scratch Memory: 4134912
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 150 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +1, now: CPU 1404, GPU 7365 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1404, GPU 7365 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1404, GPU 7365 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1403, GPU 7365 (MiB)
[INFO] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1403 MiB, GPU 7365 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1403, GPU 7365 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 1404, GPU 7365 (MiB)
[INFO] [MemUsageSnapshot] ExecutionContext creation end: CPU 1404 MiB, GPU 7478 MiB
[INFO] Starting Calibration.
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1403, GPU 7469 (MiB)
[INFO]   Post Processing Calibration data in 3.776e-06 seconds.
**[ERROR] 1: Unexpected exception _Map_base::at**
**[ERROR] Unable to create engine**
Segmentation fault (core dumped)

Although the export step was executed using the --data-type int8 using to following code scope :

!tao yolo_v4_tiny export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_cspdarknet_tiny_epoch_$EPOCH.tlt  \
                   -o $USER_EXPERIMENT_DIR/export/yolov4_cspdarknet_tiny_epoch_$EPOCH.etlt \
                   -e $SPECS_DIR/yolo_v4_tiny_retrain_chimera.txt \
                   -k $KEY \
                   --data_type int8 \
                   --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin

Any thoughts what can cause the converter step not to finish?

Many thanks!

For int8 mode, please add “-c” option to add the cal.bin file.

Thanks for the quick response! that helped!

  1. Now I receiving the warning [INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. how can I increase the tao workspace-size ?
  2. In this constellation, what are the implications of executing the converter with -t fp16 or without the -t option at all?
  3. Is the tao-converter for Jetson, same for all the inference models (ssd , yolo3, yolo4, yolo4_tiny)? since in any model’s notebook the converter is associated with the model name (tao yolov4 converter/ tao yolov4_tiny converter/ etc)

Best regards