Deploy TAO Classification_pyt FAN for Jetson Nano

darekdev · April 4, 2024, 7:07am

Hi,

With success we use TAO with yolo4-tiny or classification_tf1 for Jetson Nano. Of course, in this process, we use special libraries from Nvidia sites or for example DeepStream-Yolo for external models of yolo or even models from Darknet framework.

Past week we tried our forces with classification_pyt and got nice results. But we can’t use it with success on the Jetson Nano.
Of course we try to follow this site: https://docs.nvidia.com/tao/tao-toolkit/text/ds_tao/classification_ds.html#deepstream-configuration-file but without success, because we got this error:

ERROR: Deserialize engine failed because file path: /home/jetson/tao/export/epoch_26.onnx.engine open error
0:00:02.711928044 13461   0x5589ace0f0 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<nvinfer1> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1889> [UID = 2]: deserialize engine from file :/home/jetson/tao/export/epoch_26.onnx.engine failed
0:00:02.712051328 13461   0x5589ace0f0 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<nvinfer1> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1996> [UID = 2]: deserialize backend context from engine from file :/home/jetson/tao/export/epoch_26.onnx.engine failed, try rebuild
0:00:02.712086745 13461   0x5589ace0f0 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<nvinfer1> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1914> [UID = 2]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
ERROR: [TRT]: ModelImporter.cpp:720: While parsing node number 53 [Range -> "/backbone/pos_embed/Range_output_0"]:
ERROR: [TRT]: ModelImporter.cpp:721: --- Begin node ---
ERROR: [TRT]: ModelImporter.cpp:722: input: "/backbone/pos_embed/Constant_1_output_0"
input: "/backbone/pos_embed/Cast_output_0"
input: "/backbone/pos_embed/Constant_2_output_0"
output: "/backbone/pos_embed/Range_output_0"
name: "/backbone/pos_embed/Range"
op_type: "Range"

ERROR: [TRT]: ModelImporter.cpp:723: --- End node ---
ERROR: [TRT]: ModelImporter.cpp:726: ERROR: builtin_op_importers.cpp:3172 In function importRange:
[8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!"
ERROR: Failed to parse onnx file
ERROR: failed to build network since parsing model errors.
Caught SIGSEGV
#0  0x0000007f92f0ed5c in __waitpid (pid=<optimized out>, stat_loc=0x7fd4f8ab54, options=<optimized out>) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x0000007f92f4a2e0 in g_on_error_stack_trace ()
#2  0x0000005556afcc3c in  ()
#3  0x0000005589fcc620 in  ()
Spinning.  Please run 'gdb gst-launch-1.0 13461' to continue debugging, Ctrl-C to quit, or Ctrl-\ to dump core.

This error is from Deepstream. We know that TensorRT can’t recognize some layers. Also here is output of trtexec:

/usr/src/tensorrt/bin/trtexec --onnx=classification_model_export.onnx --maxShapes="input_1":16x3x224x224 --minShapes="input_1":1x3x224x224 --optShapes="input_1":8x3x224x224 --saveEngine=model_fan.engine

[04/04/2024-09:01:56] [I] === Model Options ===
[04/04/2024-09:01:56] [I] Format: ONNX
[04/04/2024-09:01:56] [I] Model: classification_model_export.onnx
[04/04/2024-09:01:56] [I] Output:
[04/04/2024-09:01:56] [I] === Build Options ===
[04/04/2024-09:01:56] [I] Max batch: explicit
[04/04/2024-09:01:56] [I] Workspace: 16 MiB
[04/04/2024-09:01:56] [I] minTiming: 1
[04/04/2024-09:01:56] [I] avgTiming: 8
[04/04/2024-09:01:56] [I] Precision: FP32
[04/04/2024-09:01:56] [I] Calibration:
[04/04/2024-09:01:56] [I] Refit: Disabled
[04/04/2024-09:01:56] [I] Sparsity: Disabled
[04/04/2024-09:01:56] [I] Safe mode: Disabled
[04/04/2024-09:01:56] [I] Restricted mode: Disabled
[04/04/2024-09:01:56] [I] Save engine: model_fan.engine
[04/04/2024-09:01:56] [I] Load engine:
[04/04/2024-09:01:56] [I] NVTX verbosity: 0
[04/04/2024-09:01:56] [I] Tactic sources: Using default tactic sources
[04/04/2024-09:01:56] [I] timingCacheMode: local
[04/04/2024-09:01:56] [I] timingCacheFile:
[04/04/2024-09:01:56] [I] Input(s)s format: fp32:CHW
[04/04/2024-09:01:56] [I] Output(s)s format: fp32:CHW
[04/04/2024-09:01:56] [I] Input build shape: input_1=1x3x224x224+8x3x224x224+16x3x224x224
[04/04/2024-09:01:56] [I] Input calibration shapes: model
[04/04/2024-09:01:56] [I] === System Options ===
[04/04/2024-09:01:56] [I] Device: 0
[04/04/2024-09:01:56] [I] DLACore:
[04/04/2024-09:01:56] [I] Plugins:
[04/04/2024-09:01:56] [I] === Inference Options ===
[04/04/2024-09:01:56] [I] Batch: Explicit
[04/04/2024-09:01:56] [I] Input inference shape: input_1=8x3x224x224
[04/04/2024-09:01:56] [I] Iterations: 10
[04/04/2024-09:01:56] [I] Duration: 3s (+ 200ms warm up)
[04/04/2024-09:01:56] [I] Sleep time: 0ms
[04/04/2024-09:01:56] [I] Streams: 1
[04/04/2024-09:01:56] [I] ExposeDMA: Disabled
[04/04/2024-09:01:56] [I] Data transfers: Enabled
[04/04/2024-09:01:56] [I] Spin-wait: Disabled
[04/04/2024-09:01:56] [I] Multithreading: Disabled
[04/04/2024-09:01:56] [I] CUDA Graph: Disabled
[04/04/2024-09:01:56] [I] Separate profiling: Disabled
[04/04/2024-09:01:56] [I] Time Deserialize: Disabled
[04/04/2024-09:01:56] [I] Time Refit: Disabled
[04/04/2024-09:01:56] [I] Skip inference: Disabled
[04/04/2024-09:01:56] [I] Inputs:
[04/04/2024-09:01:56] [I] === Reporting Options ===
[04/04/2024-09:01:56] [I] Verbose: Disabled
[04/04/2024-09:01:56] [I] Averages: 10 inferences
[04/04/2024-09:01:56] [I] Percentile: 99
[04/04/2024-09:01:56] [I] Dump refittable layers:Disabled
[04/04/2024-09:01:56] [I] Dump output: Disabled
[04/04/2024-09:01:56] [I] Profile: Disabled
[04/04/2024-09:01:56] [I] Export timing to JSON file:
[04/04/2024-09:01:56] [I] Export output to JSON file:
[04/04/2024-09:01:56] [I] Export profile to JSON file:
[04/04/2024-09:01:56] [I]
[04/04/2024-09:01:56] [I] === Device Information ===
[04/04/2024-09:01:56] [I] Selected Device: NVIDIA Tegra X1
[04/04/2024-09:01:56] [I] Compute Capability: 5.3
[04/04/2024-09:01:56] [I] SMs: 1
[04/04/2024-09:01:56] [I] Compute Clock Rate: 0.9216 GHz
[04/04/2024-09:01:56] [I] Device Global Memory: 3956 MiB
[04/04/2024-09:01:56] [I] Shared Memory per SM: 64 KiB
[04/04/2024-09:01:56] [I] Memory Bus Width: 64 bits (ECC disabled)
[04/04/2024-09:01:56] [I] Memory Clock Rate: 0.01275 GHz
[04/04/2024-09:01:56] [I]
[04/04/2024-09:01:56] [I] TensorRT version: 8001
[04/04/2024-09:01:58] [I] [TRT] [MemUsageChange] Init CUDA: CPU +203, GPU +0, now: CPU 221, GPU 2465 (MiB)
[04/04/2024-09:01:58] [I] Start parsing network model
[04/04/2024-09:01:58] [I] [TRT] ----------------------------------------------------------------
[04/04/2024-09:01:58] [I] [TRT] Input filename:   classification_model_export.onnx
[04/04/2024-09:01:58] [I] [TRT] ONNX IR version:  0.0.7
[04/04/2024-09:01:58] [I] [TRT] Opset version:    12
[04/04/2024-09:01:58] [I] [TRT] Producer name:    pytorch
[04/04/2024-09:01:58] [I] [TRT] Producer version: 2.2.0
[04/04/2024-09:01:58] [I] [TRT] Domain:
[04/04/2024-09:01:58] [I] [TRT] Model version:    0
[04/04/2024-09:01:58] [I] [TRT] Doc string:
[04/04/2024-09:01:58] [I] [TRT] ----------------------------------------------------------------
[04/04/2024-09:01:58] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/04/2024-09:01:58] [E] [TRT] ModelImporter.cpp:720: While parsing node number 53 [Range -> "/backbone/pos_embed/Range_output_0"]:
[04/04/2024-09:01:58] [E] [TRT] ModelImporter.cpp:721: --- Begin node ---
[04/04/2024-09:01:58] [E] [TRT] ModelImporter.cpp:722: input: "/backbone/pos_embed/Constant_1_output_0"
input: "/backbone/pos_embed/Cast_output_0"
input: "/backbone/pos_embed/Constant_2_output_0"
output: "/backbone/pos_embed/Range_output_0"
name: "/backbone/pos_embed/Range"
op_type: "Range"

[04/04/2024-09:01:58] [E] [TRT] ModelImporter.cpp:723: --- End node ---
[04/04/2024-09:01:58] [E] [TRT] ModelImporter.cpp:726: ERROR: builtin_op_importers.cpp:3172 In function importRange:
[8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!"
[04/04/2024-09:01:58] [E] Failed to parse onnx file
[04/04/2024-09:01:58] [I] Finish parsing network model
[04/04/2024-09:01:58] [E] Parsing model failed
[04/04/2024-09:01:58] [E] Engine creation failed
[04/04/2024-09:01:58] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=classification_model_export.onnx --maxShapes=input_1:16x3x224x224 --minShapes=input_1:1x3x224x224 --optShapes=input_1:8x3x224x224 --saveEngine=model_fan.engine

From the NVIDIA documentation about deployment there is no mention that there is a need for any special libraries or patches for Classification PyTorch case.

Is it possible to deploy a model from classification_pyt for Jetson Nano Deepstream 6.0? What can we do?

Thank you

Darek

Morganh · April 8, 2024, 3:26am

darekdev:

ERROR: [TRT]: ModelImporter.cpp:720: While parsing node number 53 [Range -> "/backbone/pos_embed/Range_output_0"]:
ERROR: [TRT]: ModelImporter.cpp:721: --- Begin node ---
ERROR: [TRT]: ModelImporter.cpp:722: input: "/backbone/pos_embed/Constant_1_output_0"
input: "/backbone/pos_embed/Cast_output_0"
input: "/backbone/pos_embed/Constant_2_output_0"
output: "/backbone/pos_embed/Range_output_0"
name: "/backbone/pos_embed/Range"
op_type: "Range"

Please export a new onnx file with opset 17 and retry.

darekdev · April 8, 2024, 5:32am

Hi

And still same error.

Edit:
I also checked 18 and 19. What is strange that Range op_type was in version 11. This is from onnx source code.

darekdev · April 8, 2024, 8:20am

Additionaly:

# Generate a TensorRT Engine using TAO Deploy
!tao deploy classification_pyt gen_trt_engine \
                   -e $SPECS_DIR/chips_spec_test.yaml \
                   gen_trt_engine.onnx_file=$RESULTS_DIR/classification_tiny_224_lrmomentum/export/epoch_26.onnx \
                   gen_trt_engine.trt_engine=$RESULTS_DIR/classification_tiny_224_lrmomentum/gen_trt_engine/classification_model_export.engine \
                   results_dir=$RESULTS_DIR/classification_tiny_224_lrmomentum/

And got this

2024-04-08 10:18:24,286 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-04-08 10:18:24,312 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.3.0-deploy
2024-04-08 10:18:24,329 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 288: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/user/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2024-04-08 10:18:24,329 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
Loading uff directly from the package source code
Loading uff directly from the package source code
sys:1: UserWarning: 
'chips_spec_test.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen cv.common.hydra.hydra_runner>:-1: UserWarning: 
'chips_spec_test.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Log file already exists at /results/classification_tiny_224_lrmomentum/status.json
Starting classification_pyt gen_trt_engine.
[04/08/2024-08:18:25] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 33, GPU 808 (MiB)
[04/08/2024-08:18:28] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1445, GPU +268, now: CPU 1554, GPU 1076 (MiB)
Parsing ONNX model
List inputs:
Input 0 -> input_1.
[3, 224, 224].
0.
[04/08/2024-08:18:28] [TRT] [W] The NetworkDefinitionCreationFlag::kEXPLICIT_PRECISION flag has been deprecated and has no effect. Please do not use this flag when creating the network.
[04/08/2024-08:18:28] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/08/2024-08:18:28] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
Network Description
Input 'input_1' with shape (-1, 3, 224, 224) and dtype DataType.FLOAT
Output 'probs' with shape (-1, 20) and dtype DataType.FLOAT
dynamic batch size handling
[04/08/2024-08:18:28] [TRT] [I] Graph optimization time: 0.048163 seconds.
[04/08/2024-08:18:28] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[04/08/2024-08:19:00] [TRT] [I] Detected 1 inputs and 1 output network tensors.
[04/08/2024-08:19:00] [TRT] [I] Total Host Persistent Memory: 182256
[04/08/2024-08:19:00] [TRT] [I] Total Device Persistent Memory: 0
[04/08/2024-08:19:00] [TRT] [I] Total Scratch Memory: 2434560
[04/08/2024-08:19:00] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 14 MiB
[04/08/2024-08:19:00] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 209 steps to complete.
[04/08/2024-08:19:00] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 5.71687ms to assign 11 blocks to 209 nodes requiring 6201344 bytes.
[04/08/2024-08:19:00] [TRT] [I] Total Activation Memory: 6201344
[04/08/2024-08:19:00] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +28, now: CPU 0, GPU 28 (MiB)
Export finished successfully.
Gen_trt_engine finished successfully.
2024-04-08 08:19:00,532 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_hydra: Sending telemetry data.
2024-04-08 08:19:12,718 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.common.entrypoint.entrypoint_hydra: Execution status: PASS
2024-04-08 10:19:12,833 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

Correctly generated on the server.

Morganh · April 8, 2024, 9:14am

OK, so, there is no issue when run inside tao deploy docker.
You can check the TensorRT version.
8.6.x version is needed in Jetson.

darekdev · April 8, 2024, 9:20am

There could be a problem, because we use Jetson Nano. And I can’t find a suitable version for it.

Morganh · April 8, 2024, 9:59am

You can try to check latest Jetpack.

darekdev · April 8, 2024, 10:02am

Do you mean 4.6.3?

darekdev · April 8, 2024, 10:09am

This is version I think one of the newest for jetson nano:

jetson@jetson-desktop:~/tao/export$ apt search nvidia-jetpack
Sorting... Done
Full Text Search... Done
nvidia-jetpack/stable 4.6.4-b39 arm64
  NVIDIA Jetpack Meta Package

And also tensorrt:

tensorrt/stable,now 8.2.1.9-1+cuda10.2 arm64 [installed]
  Meta package of TensorRT

And still the same error.

Edit:

This is the last version (8.5) with CUDA 10.2 support.

Morganh · April 8, 2024, 11:29am

The latest Jetpack is in JetPack SDK | NVIDIA Developer.

darekdev · April 8, 2024, 11:49am

Ok. Then using a model from classification_pyt (FAN) is impossible for Jetson Nano.

Morganh · April 8, 2024, 12:49pm

Jetpack 6.0 DP can be flashed into Nano. Then its TensorRT version is 8.6.2.

darekdev · April 8, 2024, 12:55pm

You mean Jetson Nano not Jetson Orin Nano?

In the documentation from page I see:

JetPack 6 supports all NVIDIA Jetson Orin modules and developer kits

And this line developer kits mean Jetson Nano B01?

Morganh · April 8, 2024, 1:00pm

Suggest you to create topic in Nano forum. Jetson Nano - NVIDIA Developer Forums. To check if Jetson Nano can flash Jetpack6.0 DP.

darekdev · April 8, 2024, 1:03pm

Ok. Than now I know that I need new TensorRT version.

Thank you very much Morganh.
Topic to close.

system · April 22, 2024, 1:03pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TAO 5.0 Classification (PyTorch) deploy error TAO Toolkit	49	1521	September 11, 2023
Fine-tuned TAO ClassificationTF2 Accuracy Drop after Compiling to TensorRT TAO Toolkit	34	856	August 6, 2024
Yolo V4 on Jetson Nano with JP4.6 Jetson Nano yolo	2	2038	May 3, 2022
Need working example of deployment of Mask RCNN to Jetson TAO Toolkit	8	932	November 18, 2022
Jetson Nano Python 3.7 version for Tensorrt Jetson Nano tensorrt , python	14	3875	April 12, 2023
How to run YoloV3 provided in the samples? Jetson Nano	12	2103	October 18, 2021
Unable to run TAO toolkit in Jetson device TAO Toolkit	4	590	August 2, 2023
Orin Nano Hello AI World Inferencing Not Working in JetPack 6.1 Jetson Orin Nano jetson-inference	6	261	December 11, 2024
TensorRT Inference error on Jetson nano Jetson Nano tensorrt	28	2940	February 1, 2022
How to infer using tensorRT on jetson nano? Jetson Nano tensorrt , deep-learning	4	1027	October 15, 2021

Deploy TAO Classification_pyt FAN for Jetson Nano

Related topics