Application Migration from Jetson Orin NX(16G) to Jetson Orin NX Super(16G)

yige_er · April 11, 2025, 9:08am

Platform:
Machine: aarch64
System: Linux
Python: 3.10.12
Jetpack: 6.2
Deepstream:7.1
Libraries
CUDA: 12.6.68
cuDNN: 9.3.0.75
TensorRT: 10.3.0.30
OpenCV: 4.10.0 with CUDA: YES

I have implemented a deepstream based software on the Jetson NX platform, which can run normally. When I ported it to the NX Super platform.

I refer to this link to replace the whole “sources/apps”
I refer to this link to recompile the dependency library “nvdsinfer_custom_impl_Yolo.so”
When I was converting the .pt model to an engine by using
“/usr/src/tensorrt/bin/trtexec
–onnx=model_jetson_pgie_n_20250226.onnx
–saveEngine=./model_jetson_pgie_n_20250411_fp16_batch4.engine
–memPoolSize=workspace:4096MiB
–fp16”
on the super platform, I noticed an issue with the workspace(workspace_small.png) which should like this pic(workspace_normal.png) during the conversion process .And it led the issue “Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.” :

workspace_small1960×554 125 KB

workspace_normal2016×618 126 KB

so I remove the “–memPoolSize=workspace:4096MiB” in order to it can conversion normally.

After use the model and lib, running the program (similar to “deeps tram_python_apps” DeepStream-test2), the initial display of the loaded model is as follows of super70.txt :—start----
After running for a period of time, the error is as follows of super70.txt :—error down—
here is super70.txt (running in super) and normal_out.txt(running in orin nx):
super70.txt (6.0 KB)
normal_out.txt (5.3 KB)

My Question is:

My operation should be standardized and there should be no problem, right？
How to solve the workspace issue raised in the second point? Did it cause CUDAMem error after running for a period of time?
If the error is not related to the workspace issue during model conversion, how to solve the CUDA problem encountered on the super platform?

DaneLLL · April 14, 2025, 1:47am

Hi,
Do you re-flash the whole system to r36.4.3 to enable super mode? The mode is enabled with update in some config files and device tree. It is supposed to have no impact to userspace application.

AastaLLL · April 14, 2025, 1:55am

Hi,

Please try to add one more config to allow TensorRT to spend more building time for more optimization options.

$ /usr/src/tensorrt/bin/trtexec --builderOptimizationLevel=5 ...

Thanks.

yige_er · April 14, 2025, 2:06am

I’m not quite sure what you mean by ‘r36.4.3’, but I followed this link to flash it.

And it do have super mode.

yige_er · April 14, 2025, 2:24am

&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --builderOptimizationLevel=5 --onnx=model_jetson_pgie_n_20250226.onnx --saveEngine=./model_jetson_pgie_n_20250411_fp16_batch4.engine --memPoolSize=workspace:4096MiB --fp16
[04/14/2025-10:11:07] [I] === Model Options ===
[04/14/2025-10:11:07] [I] Format: ONNX
[04/14/2025-10:11:07] [I] Model: model_jetson_pgie_n_20250226.onnx
[04/14/2025-10:11:07] [I] Output:
[04/14/2025-10:11:07] [I] === Build Options ===
[04/14/2025-10:11:07] [I] Memory Pools: workspace: 0.00390625 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[04/14/2025-10:11:07] [I] avgTiming: 8
[04/14/2025-10:11:07] [I] Precision: FP32+FP16
[04/14/2025-10:11:07] [I] LayerPrecisions: 
[04/14/2025-10:11:07] [I] Layer Device Types: 
[04/14/2025-10:11:07] [I] Calibration: 
[04/14/2025-10:11:07] [I] Refit: Disabled
[04/14/2025-10:11:07] [I] Strip weights: Disabled
[04/14/2025-10:11:07] [I] Version Compatible: Disabled
[04/14/2025-10:11:07] [I] ONNX Plugin InstanceNorm: Disabled
[04/14/2025-10:11:07] [I] TensorRT runtime: full
[04/14/2025-10:11:07] [I] Lean DLL Path: 
[04/14/2025-10:11:07] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[04/14/2025-10:11:07] [I] Exclude Lean Runtime: Disabled
[04/14/2025-10:11:07] [I] Sparsity: Disabled
[04/14/2025-10:11:07] [I] Safe mode: Disabled
[04/14/2025-10:11:07] [I] Build DLA standalone loadable: Disabled
[04/14/2025-10:11:07] [I] Allow GPU fallback for DLA: Disabled
[04/14/2025-10:11:07] [I] DirectIO mode: Disabled
[04/14/2025-10:11:07] [I] Restricted mode: Disabled
[04/14/2025-10:11:07] [I] Skip inference: Disabled
[04/14/2025-10:11:07] [I] Save engine: ./model_jetson_pgie_n_20250411_fp16_batch4.engine
[04/14/2025-10:11:07] [I] Load engine: 
[04/14/2025-10:11:07] [I] Profiling verbosity: 0
[04/14/2025-10:11:07] [I] Tactic sources: Using default tactic sources
[04/14/2025-10:11:07] [I] timingCacheMode: local
[04/14/2025-10:11:07] [I] timingCacheFile: 
[04/14/2025-10:11:07] [I] Enable Compilation Cache: Enabled
[04/14/2025-10:11:07] [I] errorOnTimingCacheMiss: Disabled
[04/14/2025-10:11:07] [I] Preview Features: Use default preview flags.
[04/14/2025-10:11:07] [I] MaxAuxStreams: -1
[04/14/2025-10:11:07] [I] BuilderOptimizationLevel: 5
[04/14/2025-10:11:07] [I] Calibration Profile Index: 0
[04/14/2025-10:11:07] [I] Weight Streaming: Disabled
[04/14/2025-10:11:07] [I] Runtime Platform: Same As Build
[04/14/2025-10:11:07] [I] Debug Tensors: 
[04/14/2025-10:11:07] [I] Input(s)s format: fp32:CHW
[04/14/2025-10:11:07] [I] Output(s)s format: fp32:CHW
[04/14/2025-10:11:07] [I] Input build shapes: model
[04/14/2025-10:11:07] [I] Input calibration shapes: model
[04/14/2025-10:11:07] [I] === System Options ===
[04/14/2025-10:11:07] [I] Device: 0
[04/14/2025-10:11:07] [I] DLACore: 
[04/14/2025-10:11:07] [I] Plugins:
[04/14/2025-10:11:07] [I] setPluginsToSerialize:
[04/14/2025-10:11:07] [I] dynamicPlugins:
[04/14/2025-10:11:07] [I] ignoreParsedPluginLibs: 0
[04/14/2025-10:11:07] [I] 
[04/14/2025-10:11:07] [I] === Inference Options ===
[04/14/2025-10:11:07] [I] Batch: Explicit
[04/14/2025-10:11:07] [I] Input inference shapes: model
[04/14/2025-10:11:07] [I] Iterations: 10
[04/14/2025-10:11:07] [I] Duration: 3s (+ 200ms warm up)
[04/14/2025-10:11:07] [I] Sleep time: 0ms
[04/14/2025-10:11:07] [I] Idle time: 0ms
[04/14/2025-10:11:07] [I] Inference Streams: 1
[04/14/2025-10:11:07] [I] ExposeDMA: Disabled
[04/14/2025-10:11:07] [I] Data transfers: Enabled
[04/14/2025-10:11:07] [I] Spin-wait: Disabled
[04/14/2025-10:11:07] [I] Multithreading: Disabled
[04/14/2025-10:11:07] [I] CUDA Graph: Disabled
[04/14/2025-10:11:07] [I] Separate profiling: Disabled
[04/14/2025-10:11:07] [I] Time Deserialize: Disabled
[04/14/2025-10:11:07] [I] Time Refit: Disabled
[04/14/2025-10:11:07] [I] NVTX verbosity: 0
[04/14/2025-10:11:07] [I] Persistent Cache Ratio: 0
[04/14/2025-10:11:07] [I] Optimization Profile Index: 0
[04/14/2025-10:11:07] [I] Weight Streaming Budget: 100.000000%
[04/14/2025-10:11:07] [I] Inputs:
[04/14/2025-10:11:07] [I] Debug Tensor Save Destinations:
[04/14/2025-10:11:07] [I] === Reporting Options ===
[04/14/2025-10:11:07] [I] Verbose: Disabled
[04/14/2025-10:11:07] [I] Averages: 10 inferences
[04/14/2025-10:11:07] [I] Percentiles: 90,95,99
[04/14/2025-10:11:07] [I] Dump refittable layers:Disabled
[04/14/2025-10:11:07] [I] Dump output: Disabled
[04/14/2025-10:11:07] [I] Profile: Disabled
[04/14/2025-10:11:07] [I] Export timing to JSON file: 
[04/14/2025-10:11:07] [I] Export output to JSON file: 
[04/14/2025-10:11:07] [I] Export profile to JSON file: 
[04/14/2025-10:11:07] [I] 
[04/14/2025-10:11:07] [I] === Device Information ===
[04/14/2025-10:11:07] [I] Available Devices: 
[04/14/2025-10:11:07] [I]   Device 0: "Orin" UUID: GPU-10bbbeac-937e-5daa-9911-3c1c1a2fde5f
[04/14/2025-10:11:07] [I] Selected Device: Orin
[04/14/2025-10:11:07] [I] Selected Device ID: 0
[04/14/2025-10:11:07] [I] Selected Device UUID: GPU-10bbbeac-937e-5daa-9911-3c1c1a2fde5f
[04/14/2025-10:11:07] [I] Compute Capability: 8.7
[04/14/2025-10:11:07] [I] SMs: 8
[04/14/2025-10:11:07] [I] Device Global Memory: 15655 MiB
[04/14/2025-10:11:07] [I] Shared Memory per SM: 164 KiB
[04/14/2025-10:11:07] [I] Memory Bus Width: 256 bits (ECC disabled)
[04/14/2025-10:11:07] [I] Application Compute Clock Rate: 1.173 GHz
[04/14/2025-10:11:07] [I] Application Memory Clock Rate: 1.173 GHz
[04/14/2025-10:11:07] [I] 
[04/14/2025-10:11:07] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[04/14/2025-10:11:07] [I] 
[04/14/2025-10:11:07] [I] TensorRT version: 10.3.0
[04/14/2025-10:11:07] [I] Loading standard plugins
[04/14/2025-10:11:07] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 31, GPU 2320 (MiB)
[04/14/2025-10:11:10] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +928, GPU +1092, now: CPU 1002, GPU 3456 (MiB)
[04/14/2025-10:11:10] [I] Start parsing network model.
[04/14/2025-10:11:10] [I] [TRT] ----------------------------------------------------------------
[04/14/2025-10:11:10] [I] [TRT] Input filename:   model_jetson_pgie_n_20250226.onnx
[04/14/2025-10:11:10] [I] [TRT] ONNX IR version:  0.0.7
[04/14/2025-10:11:10] [I] [TRT] Opset version:    12
[04/14/2025-10:11:10] [I] [TRT] Producer name:    pytorch
[04/14/2025-10:11:10] [I] [TRT] Producer version: 2.6.0
[04/14/2025-10:11:10] [I] [TRT] Domain:           
[04/14/2025-10:11:10] [I] [TRT] Model version:    0
[04/14/2025-10:11:10] [I] [TRT] Doc string:       
[04/14/2025-10:11:10] [I] [TRT] ----------------------------------------------------------------
[04/14/2025-10:11:10] [I] Finished parsing network model. Parse time: 0.0496349
[04/14/2025-10:11:10] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[04/14/2025-10:11:50] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.
[04/14/2025-10:17:26] [W] [TRT] Engine generation failed with backend strategy 4.
Error message: [optimizer.cpp::computeCosts::4148] Error Code 4: Internal Error (Could not find any implementation for node /1/ArgMax due to insufficient workspace. See verbose log for requested sizes.).
Skipping this backend strategy.
[04/14/2025-10:17:26] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[04/14/2025-10:17:44] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.
[04/14/2025-10:19:29] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 8601600 detected for tactic 0x0000000000000000.
[04/14/2025-10:19:29] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 4300800 detected for tactic 0x0000000000000000.
[04/14/2025-10:19:29] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 2150400 detected for tactic 0x0000000000000000.
[04/14/2025-10:19:29] [W] [TRT] Engine generation failed with backend strategy 3.
Error message: [optimizer.cpp::computeCosts::4148] Error Code 4: Internal Error (Could not find any implementation for node {ForeignNode[/0/model.22/Concat...ONNXTRT_ShapeShuffle_30 + /0/model.22/dfl/Transpose_1]} due to insufficient workspace. See verbose log for requested sizes.).
Skipping this backend strategy.
[04/14/2025-10:19:29] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[04/14/2025-10:19:51] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.
[04/14/2025-10:23:22] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 403200 detected for tactic 0x0000000000000000.
[04/14/2025-10:23:22] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 235200 detected for tactic 0x0000000000000000.
[04/14/2025-10:23:22] [W] [TRT] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 235200 detected for tactic 0x0000000000000000.
[04/14/2025-10:23:22] [W] [TRT] Engine generation failed with backend strategy 2.
Error message: [optimizer.cpp::computeCosts::4148] Error Code 4: Internal Error (Could not find any implementation for node {ForeignNode[/0/model.22/Split_1_27.../1/Slice]} due to insufficient workspace. See verbose log for requested sizes.).
Skipping this backend strategy.
[04/14/2025-10:23:22] [E] Error[2]: [engineBuilder.cpp::makeEngineFromSubGraph::1879] Error Code 2: Internal Error (Engine generation failed because all backend strategies failed.)
[04/14/2025-10:23:22] [E] Engine could not be created from network
[04/14/2025-10:23:22] [E] Building engine failed
[04/14/2025-10:23:22] [E] Failed to create engine from model or file.
[04/14/2025-10:23:22] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --builderOptimizationLevel=5 --onnx=model_jetson_pgie_n_20250226.onnx --saveEngine=./model_jetson_pgie_n_20250411_fp16_batch4.engine --memPoolSize=workspace:4096MiB --fp16

AastaLLL · April 14, 2025, 2:57am

Hi,

Just try it with a built-in model, we can set the workspace to 4096 without issue.
(using 4096 instead of 4096MiB)
Please give it a try.

$ /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx --memPoolSize=workspace:4096
&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx --memPoolSize=workspace:4096
[04/14/2025-02:52:50] [I] === Model Options ===
[04/14/2025-02:52:50] [I] Format: ONNX
[04/14/2025-02:52:50] [I] Model: /usr/src/tensorrt/data/mnist/mnist.onnx
[04/14/2025-02:52:50] [I] Output:
[04/14/2025-02:52:50] [I] === Build Options ===
[04/14/2025-02:52:50] [I] Memory Pools: workspace: 4096 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[04/14/2025-02:52:50] [I] avgTiming: 8
[04/14/2025-02:52:50] [I] Precision: FP32
[04/14/2025-02:52:50] [I] LayerPrecisions: 
[04/14/2025-02:52:50] [I] Layer Device Types: 
...

Thanks.

yige_er · April 14, 2025, 3:52am

Luckly, it works. I will use that model try to verify the software running problems.
^_^ thank you.

yige_er · April 15, 2025, 1:08am

Do we have a method to compare two models? I wonder why my new model cannot be used normally (the bounding box is not displayed), while the old model can display it. It’s amazing that both the new and old engine models come from the same .pt model.

My operation should be standardized and there should be no problem, right？

AastaLLL · April 15, 2025, 3:38am

Hi,

Do you use the same “ONNX” file to convert the TensorRT engine?
If not, could you give it a try as this will reduce one of the differences?

Thanks.

yige_er · April 15, 2025, 6:34am

right, when I use old onnx file to conversion as you recommend, it can display the bounding box. but still when it run around 10 minutes, window display will stop with exception and the terminal output:

GPUassert: an illegal memory access was encountered /dvs/git/dirty/git-master_linux/deepstream/sdk/src/utils/nvmultiobjecttracker/src/modules/cuDCFv2/cuDCFFrameTransformTexture.cu 693
0:10:40.834126732 15627 0xaaaad3a8a240 ERROR nvinfer gstnvinfer.cpp:1267:get_converted_buffer: cudaMemset2DAsync failed with error cudaErrorIllegalAddress while converting buffer
0:10:40.834163060 15627 0xaaaad3a89c60 ERROR nvinfer gstnvinfer.cpp:1225:get_converted_buffer: cudaMemset2DAsync failed with error cudaErrorIllegalAddress while converting buffer
0:10:40.834212318 15627 0xaaaad3a89c60 WARN nvinfer gstnvinfer.cpp:1894:gst_nvinfer_process_objects: error: Buffer conversion failed
ERROR: Failed to add cudaStream callback for returning input buffers, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:10:40.834189754 15627 0xaaaad3a8a240 WARN nvinfer gstnvinfer.cpp:1576:gst_nvinfer_process_full_frame: error: Buffer conversion failed
0:10:40.834359582 15627 0xaaaad3a89c00 WARN nvinfer gstnvinfer.cpp:1420:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
0:10:40.834324726 15627 0xaaaad3a89c60 ERROR nvinfer gstnvinfer.cpp:1225:get_converted_buffer: cudaMemset2DAsync failed with error cudaErrorIllegalAddress while converting buffer
0:10:40.834432046 15627 0xaaaad3a89c60 WARN nvinfer gstnvinfer.cpp:1894:gst_nvinfer_process_objects: error: Buffer conversion failed

!![Exception] GPUassert failed
An exception occurred. GPUassert failed
gstnvtracker: Low-level tracker lib returned error 1
/dvs/git/dirty/git-master_linux/nvutils/nvbufsurftransform/nvbufsurftransform_copy.cpp:552: => Failed in mem copy

nagesh_accord · April 15, 2025, 6:44am

Linux-For-Tegra version - revision 36.4.3

yige_er · April 15, 2025, 7:15am

sorry, kind of forgot, after I check, I do re-flash system to r36.4.3 by using following things:

AastaLLL · April 16, 2025, 6:03am

Hi,

The nvbufsurftransform_copy reports “Failed in mem copy” is a known issue and is fixed in the upcoming Deepstream release.
Please find below comment for more info:

Thanks.

yige_er · April 16, 2025, 6:21am

Do you mean this issue is normal in current version of Jetpack6.2 with deepstream-7.1.
I am curious why Jetpack6.0 with Deepstream-7.0 seems not have this question. And if I decrease the ai video channel from 4 to 2, it seems solved.（I say this because the 2 channel video has been running normally since 6pm yesterday until now）

AastaLLL · April 17, 2025, 7:28am

Hi,

This is a known issue as it has been reported by other users before.
There is a WAR shared in the above link. Could you give it a try to see if it can also help your issue?

nvvideoconvert compute-hw=1 nvbuf-memory-type=3

Thanks.

yige_er · April 22, 2025, 12:22am

I added this attribute "compute-hw=1"to nvvideoconvert, when I added ‘nvbuf-memory-type=3’ ,I got following info:

nvbufsurface: NvBufSurfaceSyncForCpu: wrong memType(3)
libnvosd (121):(ERROR) : Cache sync failed
0:00:05.255075919 4077 0xaaaadfb0f4c0 WARN nvinfer gstnvinfer.cpp:2423:gst_nvinfer_output_loop: error: Internal data stream error.
0:00:05.255114384 4077 0xaaaadfb0f4c0 WARN nvinfer gstnvinfer.cpp:2423:gst_nvinfer_output_loop: error: streaming stopped, reason error (-5)

and ‘nvbuf-memory-type=2’ will also get :

nvbufsurface: mapping of memory type (2) not supported
nvbufsurface: error in mapping
libnvosd (86):(ERROR) : Buffer mapping failed
0:00:06.449426751 8203 0xaaab220448c0 WARN nvinfer gstnvinfer.cpp:2423:gst_nvinfer_output_loop: error: Internal data stream error.
0:00:06.449479452 8203 0xaaab220448c0 WARN nvinfer gstnvinfer.cpp:2423:gst_nvinfer_output_loop: error: streaming stopped, reason error (-5)

so I just added ‘compute-hw=1’. and the running time has indeed improved.
but I don’t know why , when I was testing it, I found that jetson would automatically restart after running for hours. Is this a hardware problem, such as the hardware is overheating?

yige_er · April 23, 2025, 2:46am

I found the whole Jetson is stuck, including the mouse, the jtop shows:

the jetson restart should be caused by this?

AastaLLL · April 23, 2025, 9:18am

Hi,

Could you share your source and model so we can test it internally to check it further?
Thanks.

yige_er · April 28, 2025, 9:18am

Hi,
Here is the GitHub link:

AastaLLL · April 30, 2025, 7:18am

Hi,

Thanks for sharing the source.
We will give it a try and provide more info to you.

Thanks.

Topic		Replies	Views
The GPU does not work DeepStream SDK kernel , python , deepstream	10	116	November 1, 2024
Unstable deepstream rtsp pipeline - CUDA runtime error DeepStream SDK jetson , deepstream	7	75	April 1, 2025
Error while Running SCRFD in deepstream DeepStream SDK	39	1382	September 28, 2023
Deepstream 7 crashes after few seconds on Jetpack 6 Metropolis Microservices for Jetson deepstream	2	69	August 28, 2024
Deepstream freezes on Jetson DeepStream SDK	13	2140	October 12, 2021
Pipeline crashes after an illegal memory access DeepStream SDK gstreamer , jetson , deepstream	7	158	February 19, 2025
Failed to allocate buffer DeepStream SDK platform-jetson	9	842	July 9, 2024
"No Kernel Image is Available for execution on the device" Error using Deepstream DeepStream SDK cuda	10	3078	October 12, 2021
Migrated from DeepStream 4 to Deepstream 5 and got errors DeepStream SDK nvbugs	36	2435	October 12, 2021
Error in Deepstream 6.1 DeepStream SDK deepstream	21	596	June 6, 2024

Application Migration from Jetson Orin NX(16G) to Jetson Orin NX Super(16G)

Related topics