Deploy TAO Classification_pyt FAN for Jetson Nano

Hi,

With success we use TAO with yolo4-tiny or classification_tf1 for Jetson Nano. Of course, in this process, we use special libraries from Nvidia sites or for example DeepStream-Yolo for external models of yolo or even models from Darknet framework.

Past week we tried our forces with classification_pyt and got nice results. But we can’t use it with success on the Jetson Nano.
Of course we try to follow this site: https://docs.nvidia.com/tao/tao-toolkit/text/ds_tao/classification_ds.html#deepstream-configuration-file but without success, because we got this error:

ERROR: Deserialize engine failed because file path: /home/jetson/tao/export/epoch_26.onnx.engine open error
0:00:02.711928044 13461   0x5589ace0f0 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<nvinfer1> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1889> [UID = 2]: deserialize engine from file :/home/jetson/tao/export/epoch_26.onnx.engine failed
0:00:02.712051328 13461   0x5589ace0f0 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<nvinfer1> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1996> [UID = 2]: deserialize backend context from engine from file :/home/jetson/tao/export/epoch_26.onnx.engine failed, try rebuild
0:00:02.712086745 13461   0x5589ace0f0 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<nvinfer1> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1914> [UID = 2]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
ERROR: [TRT]: ModelImporter.cpp:720: While parsing node number 53 [Range -> "/backbone/pos_embed/Range_output_0"]:
ERROR: [TRT]: ModelImporter.cpp:721: --- Begin node ---
ERROR: [TRT]: ModelImporter.cpp:722: input: "/backbone/pos_embed/Constant_1_output_0"
input: "/backbone/pos_embed/Cast_output_0"
input: "/backbone/pos_embed/Constant_2_output_0"
output: "/backbone/pos_embed/Range_output_0"
name: "/backbone/pos_embed/Range"
op_type: "Range"

ERROR: [TRT]: ModelImporter.cpp:723: --- End node ---
ERROR: [TRT]: ModelImporter.cpp:726: ERROR: builtin_op_importers.cpp:3172 In function importRange:
[8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!"
ERROR: Failed to parse onnx file
ERROR: failed to build network since parsing model errors.
Caught SIGSEGV
#0  0x0000007f92f0ed5c in __waitpid (pid=<optimized out>, stat_loc=0x7fd4f8ab54, options=<optimized out>) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x0000007f92f4a2e0 in g_on_error_stack_trace ()
#2  0x0000005556afcc3c in  ()
#3  0x0000005589fcc620 in  ()
Spinning.  Please run 'gdb gst-launch-1.0 13461' to continue debugging, Ctrl-C to quit, or Ctrl-\ to dump core.

This error is from Deepstream. We know that TensorRT can’t recognize some layers. Also here is output of trtexec:

/usr/src/tensorrt/bin/trtexec --onnx=classification_model_export.onnx --maxShapes="input_1":16x3x224x224 --minShapes="input_1":1x3x224x224 --optShapes="input_1":8x3x224x224 --saveEngine=model_fan.engine

[04/04/2024-09:01:56] [I] === Model Options ===
[04/04/2024-09:01:56] [I] Format: ONNX
[04/04/2024-09:01:56] [I] Model: classification_model_export.onnx
[04/04/2024-09:01:56] [I] Output:
[04/04/2024-09:01:56] [I] === Build Options ===
[04/04/2024-09:01:56] [I] Max batch: explicit
[04/04/2024-09:01:56] [I] Workspace: 16 MiB
[04/04/2024-09:01:56] [I] minTiming: 1
[04/04/2024-09:01:56] [I] avgTiming: 8
[04/04/2024-09:01:56] [I] Precision: FP32
[04/04/2024-09:01:56] [I] Calibration:
[04/04/2024-09:01:56] [I] Refit: Disabled
[04/04/2024-09:01:56] [I] Sparsity: Disabled
[04/04/2024-09:01:56] [I] Safe mode: Disabled
[04/04/2024-09:01:56] [I] Restricted mode: Disabled
[04/04/2024-09:01:56] [I] Save engine: model_fan.engine
[04/04/2024-09:01:56] [I] Load engine:
[04/04/2024-09:01:56] [I] NVTX verbosity: 0
[04/04/2024-09:01:56] [I] Tactic sources: Using default tactic sources
[04/04/2024-09:01:56] [I] timingCacheMode: local
[04/04/2024-09:01:56] [I] timingCacheFile:
[04/04/2024-09:01:56] [I] Input(s)s format: fp32:CHW
[04/04/2024-09:01:56] [I] Output(s)s format: fp32:CHW
[04/04/2024-09:01:56] [I] Input build shape: input_1=1x3x224x224+8x3x224x224+16x3x224x224
[04/04/2024-09:01:56] [I] Input calibration shapes: model
[04/04/2024-09:01:56] [I] === System Options ===
[04/04/2024-09:01:56] [I] Device: 0
[04/04/2024-09:01:56] [I] DLACore:
[04/04/2024-09:01:56] [I] Plugins:
[04/04/2024-09:01:56] [I] === Inference Options ===
[04/04/2024-09:01:56] [I] Batch: Explicit
[04/04/2024-09:01:56] [I] Input inference shape: input_1=8x3x224x224
[04/04/2024-09:01:56] [I] Iterations: 10
[04/04/2024-09:01:56] [I] Duration: 3s (+ 200ms warm up)
[04/04/2024-09:01:56] [I] Sleep time: 0ms
[04/04/2024-09:01:56] [I] Streams: 1
[04/04/2024-09:01:56] [I] ExposeDMA: Disabled
[04/04/2024-09:01:56] [I] Data transfers: Enabled
[04/04/2024-09:01:56] [I] Spin-wait: Disabled
[04/04/2024-09:01:56] [I] Multithreading: Disabled
[04/04/2024-09:01:56] [I] CUDA Graph: Disabled
[04/04/2024-09:01:56] [I] Separate profiling: Disabled
[04/04/2024-09:01:56] [I] Time Deserialize: Disabled
[04/04/2024-09:01:56] [I] Time Refit: Disabled
[04/04/2024-09:01:56] [I] Skip inference: Disabled
[04/04/2024-09:01:56] [I] Inputs:
[04/04/2024-09:01:56] [I] === Reporting Options ===
[04/04/2024-09:01:56] [I] Verbose: Disabled
[04/04/2024-09:01:56] [I] Averages: 10 inferences
[04/04/2024-09:01:56] [I] Percentile: 99
[04/04/2024-09:01:56] [I] Dump refittable layers:Disabled
[04/04/2024-09:01:56] [I] Dump output: Disabled
[04/04/2024-09:01:56] [I] Profile: Disabled
[04/04/2024-09:01:56] [I] Export timing to JSON file:
[04/04/2024-09:01:56] [I] Export output to JSON file:
[04/04/2024-09:01:56] [I] Export profile to JSON file:
[04/04/2024-09:01:56] [I]
[04/04/2024-09:01:56] [I] === Device Information ===
[04/04/2024-09:01:56] [I] Selected Device: NVIDIA Tegra X1
[04/04/2024-09:01:56] [I] Compute Capability: 5.3
[04/04/2024-09:01:56] [I] SMs: 1
[04/04/2024-09:01:56] [I] Compute Clock Rate: 0.9216 GHz
[04/04/2024-09:01:56] [I] Device Global Memory: 3956 MiB
[04/04/2024-09:01:56] [I] Shared Memory per SM: 64 KiB
[04/04/2024-09:01:56] [I] Memory Bus Width: 64 bits (ECC disabled)
[04/04/2024-09:01:56] [I] Memory Clock Rate: 0.01275 GHz
[04/04/2024-09:01:56] [I]
[04/04/2024-09:01:56] [I] TensorRT version: 8001
[04/04/2024-09:01:58] [I] [TRT] [MemUsageChange] Init CUDA: CPU +203, GPU +0, now: CPU 221, GPU 2465 (MiB)
[04/04/2024-09:01:58] [I] Start parsing network model
[04/04/2024-09:01:58] [I] [TRT] ----------------------------------------------------------------
[04/04/2024-09:01:58] [I] [TRT] Input filename:   classification_model_export.onnx
[04/04/2024-09:01:58] [I] [TRT] ONNX IR version:  0.0.7
[04/04/2024-09:01:58] [I] [TRT] Opset version:    12
[04/04/2024-09:01:58] [I] [TRT] Producer name:    pytorch
[04/04/2024-09:01:58] [I] [TRT] Producer version: 2.2.0
[04/04/2024-09:01:58] [I] [TRT] Domain:
[04/04/2024-09:01:58] [I] [TRT] Model version:    0
[04/04/2024-09:01:58] [I] [TRT] Doc string:
[04/04/2024-09:01:58] [I] [TRT] ----------------------------------------------------------------
[04/04/2024-09:01:58] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/04/2024-09:01:58] [E] [TRT] ModelImporter.cpp:720: While parsing node number 53 [Range -> "/backbone/pos_embed/Range_output_0"]:
[04/04/2024-09:01:58] [E] [TRT] ModelImporter.cpp:721: --- Begin node ---
[04/04/2024-09:01:58] [E] [TRT] ModelImporter.cpp:722: input: "/backbone/pos_embed/Constant_1_output_0"
input: "/backbone/pos_embed/Cast_output_0"
input: "/backbone/pos_embed/Constant_2_output_0"
output: "/backbone/pos_embed/Range_output_0"
name: "/backbone/pos_embed/Range"
op_type: "Range"

[04/04/2024-09:01:58] [E] [TRT] ModelImporter.cpp:723: --- End node ---
[04/04/2024-09:01:58] [E] [TRT] ModelImporter.cpp:726: ERROR: builtin_op_importers.cpp:3172 In function importRange:
[8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!"
[04/04/2024-09:01:58] [E] Failed to parse onnx file
[04/04/2024-09:01:58] [I] Finish parsing network model
[04/04/2024-09:01:58] [E] Parsing model failed
[04/04/2024-09:01:58] [E] Engine creation failed
[04/04/2024-09:01:58] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=classification_model_export.onnx --maxShapes=input_1:16x3x224x224 --minShapes=input_1:1x3x224x224 --optShapes=input_1:8x3x224x224 --saveEngine=model_fan.engine

From the NVIDIA documentation about deployment there is no mention that there is a need for any special libraries or patches for Classification PyTorch case.

Is it possible to deploy a model from classification_pyt for Jetson Nano Deepstream 6.0? What can we do?

Thank you

Darek

Please export a new onnx file with opset 17 and retry.

Hi

And still same error.

Edit:
I also checked 18 and 19. What is strange that Range op_type was in version 11. This is from onnx source code.

Additionaly:

# Generate a TensorRT Engine using TAO Deploy
!tao deploy classification_pyt gen_trt_engine \
                   -e $SPECS_DIR/chips_spec_test.yaml \
                   gen_trt_engine.onnx_file=$RESULTS_DIR/classification_tiny_224_lrmomentum/export/epoch_26.onnx \
                   gen_trt_engine.trt_engine=$RESULTS_DIR/classification_tiny_224_lrmomentum/gen_trt_engine/classification_model_export.engine \
                   results_dir=$RESULTS_DIR/classification_tiny_224_lrmomentum/

And got this

2024-04-08 10:18:24,286 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-04-08 10:18:24,312 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.3.0-deploy
2024-04-08 10:18:24,329 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 288: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/user/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2024-04-08 10:18:24,329 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
Loading uff directly from the package source code
Loading uff directly from the package source code
sys:1: UserWarning: 
'chips_spec_test.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen cv.common.hydra.hydra_runner>:-1: UserWarning: 
'chips_spec_test.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Log file already exists at /results/classification_tiny_224_lrmomentum/status.json
Starting classification_pyt gen_trt_engine.
[04/08/2024-08:18:25] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 33, GPU 808 (MiB)
[04/08/2024-08:18:28] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1445, GPU +268, now: CPU 1554, GPU 1076 (MiB)
Parsing ONNX model
List inputs:
Input 0 -> input_1.
[3, 224, 224].
0.
[04/08/2024-08:18:28] [TRT] [W] The NetworkDefinitionCreationFlag::kEXPLICIT_PRECISION flag has been deprecated and has no effect. Please do not use this flag when creating the network.
[04/08/2024-08:18:28] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/08/2024-08:18:28] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
Network Description
Input 'input_1' with shape (-1, 3, 224, 224) and dtype DataType.FLOAT
Output 'probs' with shape (-1, 20) and dtype DataType.FLOAT
dynamic batch size handling
[04/08/2024-08:18:28] [TRT] [I] Graph optimization time: 0.048163 seconds.
[04/08/2024-08:18:28] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[04/08/2024-08:19:00] [TRT] [I] Detected 1 inputs and 1 output network tensors.
[04/08/2024-08:19:00] [TRT] [I] Total Host Persistent Memory: 182256
[04/08/2024-08:19:00] [TRT] [I] Total Device Persistent Memory: 0
[04/08/2024-08:19:00] [TRT] [I] Total Scratch Memory: 2434560
[04/08/2024-08:19:00] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 14 MiB
[04/08/2024-08:19:00] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 209 steps to complete.
[04/08/2024-08:19:00] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 5.71687ms to assign 11 blocks to 209 nodes requiring 6201344 bytes.
[04/08/2024-08:19:00] [TRT] [I] Total Activation Memory: 6201344
[04/08/2024-08:19:00] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +28, now: CPU 0, GPU 28 (MiB)
Export finished successfully.
Gen_trt_engine finished successfully.
2024-04-08 08:19:00,532 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_hydra: Sending telemetry data.
2024-04-08 08:19:12,718 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_hydra: Execution status: PASS
2024-04-08 10:19:12,833 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

Correctly generated on the server.

OK, so, there is no issue when run inside tao deploy docker.
You can check the TensorRT version.
8.6.x version is needed in Jetson.

There could be a problem, because we use Jetson Nano. And I can’t find a suitable version for it.

You can try to check latest Jetpack.

Do you mean 4.6.3?

This is version I think one of the newest for jetson nano:

jetson@jetson-desktop:~/tao/export$ apt search nvidia-jetpack
Sorting... Done
Full Text Search... Done
nvidia-jetpack/stable 4.6.4-b39 arm64
  NVIDIA Jetpack Meta Package

And also tensorrt:

tensorrt/stable,now 8.2.1.9-1+cuda10.2 arm64 [installed]
  Meta package of TensorRT

And still the same error.

Edit:


This is the last version (8.5) with CUDA 10.2 support.

The latest Jetpack is in JetPack SDK | NVIDIA Developer.

Ok. Then using a model from classification_pyt (FAN) is impossible for Jetson Nano.

Jetpack 6.0 DP can be flashed into Nano. Then its TensorRT version is 8.6.2.

You mean Jetson Nano not Jetson Orin Nano?

In the documentation from page I see:

JetPack 6 supports all NVIDIA Jetson Orin modules and developer kits

And this line developer kits mean Jetson Nano B01?

Suggest you to create topic in Nano forum. Jetson Nano - NVIDIA Developer Forums. To check if Jetson Nano can flash Jetpack6.0 DP.

Ok. Than now I know that I need new TensorRT version.

Thank you very much Morganh.
Topic to close.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.