Building OpenCV with CUDA support in Orin NX 16GB

I have built OpenCV 4.10 on the Photon GNX002 board with the Orin NX 16 GB SoC.

This is part of a migration from JetPack 5.1.2 to JetPack 6.2 with the latest BSP with support for our cameras. We link our C++ software with our own build of OpenCV 4.4, using CUDA 11.4.315, the same version in JetPack 5.1.2.

This other post inquired about building OpenCV 4.8.0 with CUDA support on Orin NX and JetPack 6.2. However, I quickly decided to choose 4.10 to avoid problems with g++ 11, as recommended in this other post: OpenCV with CUDA support on JetPack 6.0.

I have attached the script I used to build OpenCV. I recreated the engine files from the same onnx file that I used on JetPack 5.1.2. Then I ran a release build of our software, and I got the LLVM Error: Out of memory in our application logs, while the application crashed without any other errors.

When I run our C++ application from a DEBUG build, everything works as expected: I get cameras, analytics, etc. No errors in the logs.

I have written a small test program to test if TensorRT 10.3 is the cause of this LLVM out-of-memory error. It does not fail. What other library could be causing this? We use OpenCV extensively for video frame processing and analytics, and the classes in NvInfer.h for inference.

I could try building OpenCV with different CUDA versions, not the CUDA 12.6.68 I got from the JetPack. Would that help, which version?

Below is the system information after I ran the script attached.

Thanks

build_opencv_JP62.sh.txt (6.0 KB)

Hi,
Could you try the script:

Jetson AGX Orin FAQ
install_opencv4.10.0_Jetpack6.1.sh

See if it works for the use-case.

@DaneLLL Sorry I tested with the script to install OpenCV4.9 first, by mistake and ran into the opencv_contrib error that has been documented elsewhere.

I am running the script/install_opencv4.10.0_Jetpack6.1.sh right now to test on my system. It builds without errors. The only modification I made to the script was to add support for the opencd_world package because one of our C++ programs uses it.

-D BUILD_opencv_world=ON

However, when I ran our applications in RELEASE mode, the same error appeared in the logs. LLVM Error: out of memory.

The service restarts happen every 2 to 3 seconds. Before I continue digging deeper. Can you suggest a containerized solution for OpenCV with CUDA (thus with access to the GPU) such that one can keep the OpenCV 4.8 without CUDA support that the SDK installs. That way, one could build the C++ applications to dynamically link with OpenCV with or without the GPU acceleration and compare the application performance and resource utilization?

If we cannot use CUDA for OpenCV, I’d like to know how much performance we are giving up.

We had never encountered this error on Jetpack 5.x on TX2, Orin NX, or AGX-based systems.

Thanks,

Pablo

Hi,
We don’t experience this issue. Do you observe it without -D BUILD_opencv_world=ON?

@DaneLLL, thanks for your reply. I rebuild without opencv_world. I had to modify C++ projects to include and link the OpenCV libraries independently. I am a bit stuck using the C++ code to generate the YOLO v8 nano model from an ONNX format (13MB file) to an engine file for TensorRT. The C++ code that we have been using since YOLOv4 and more recently with YOLOv8 for JetPack 5.1.2 fails on JetPack 6.2 with an out-of-memory error. I am using a more up-to-date Python 3 script to do the same; however, the error after a few minutes is the same.

This is the script:

onnx_to_tensorrt.py.txt (8.7 KB)

yolov8n_1_3_640_640.onnx.txt (12.2 MB)

python3 onnx_to_tensorrt.py --fp16 --workspace 512 --force
[2025-09-10 03:55:38,959] [INFO] === System Information ===
[2025-09-10 03:55:38,959] [INFO] TensorRT version: 10.3.0
[2025-09-10 03:55:38,960] [INFO] pynvml not available, skipping GPU memory info
[2025-09-10 03:55:38,960] [INFO] ===============================
[2025-09-10 03:55:38,960] [INFO] Starting TensorRT engine conversion...
[2025-09-10 03:55:38,960] [INFO] Input ONNX: /home/intelliview/git/hvr7/SmrtHVR/3rdParty/YoloData/Sources/yolov8n_1_3_640_640.onnx
[2025-09-10 03:55:38,960] [INFO] Output Engine: yolov8n_1_3_640_640_fp16_jp62.engine
[2025-09-10 03:55:38,960] [INFO] Workspace Size: 512 MB
[2025-09-10 03:55:38,960] [INFO] Precision: FP16=True, INT8=False
[09/10/2025-03:55:39] [TRT] [I] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 35, GPU 4562 (MiB)
[09/10/2025-03:55:41] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +927, GPU +754, now: CPU 1005, GPU 5360 (MiB)
[2025-09-10 03:55:41,351] [INFO] Applying Jetson Orin NX memory optimizations...
[2025-09-10 03:55:41,351] [INFO] Workspace memory limit: 512 MB
[2025-09-10 03:55:41,351] [INFO] Tactic shared memory: 128 MB
[2025-09-10 03:55:41,351] [INFO] FP16 precision enabled
[2025-09-10 03:55:41,352] [INFO] Parsing ONNX model...
[2025-09-10 03:55:41,385] [INFO] ONNX model parsed successfully
[2025-09-10 03:55:41,385] [INFO] Network inputs: 1
[2025-09-10 03:55:41,385] [INFO] Network outputs: 1
[2025-09-10 03:55:41,385] [INFO] Input 0: images - Shape: (1, 3, 640, 640) - Dtype: DataType.FLOAT
[2025-09-10 03:55:41,386] [INFO] Output 0: output0 - Shape: (1, 84, 8400) - Dtype: DataType.FLOAT
[2025-09-10 03:55:41,386] [INFO] Building TensorRT engine... This may take several minutes.
[09/10/2025-03:55:41] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[09/10/2025-04:04:19] [TRT] [I] Detected 1 inputs and 3 output network tensors.
[09/10/2025-04:04:20] [TRT] [E] [defaultAllocator.cpp::allocateAsync::48] Error Code 1: Cuda Runtime (out of memory)
[09/10/2025-04:04:20] [TRT] [W] Requested amount of GPU memory (512 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[09/10/2025-04:04:20] [TRT] [E] [wtsEngineRtUtils.cpp::executeWtsEngine::159] Error Code 2: OutOfMemory (Requested size was 512 bytes.)
[2025-09-10 04:04:20,371] [ERROR] Failed to build TensorRT engine
[2025-09-10 04:04:20,846] [ERROR] ❌ Conversion failed

It appears that something is wrong with the memory allocation used by TensorRT.
What are my options at this point?
The same C++ code used to work with JP 5.1.2 on exactly the same hardware to perform this conversion. The Python code confirms the memory allocation challenge faced by TensorRT 10.3.0:

[09/10/2025-04:15:28] [TRT] [E] [defaultAllocator.cpp::allocateAsync::48] Error Code 1: Cuda Runtime (out of memory)
[09/10/2025-04:15:28] [TRT] [E] [wtsEngineRtUtils.cpp::executeWtsEngine::159] Error Code 2: OutOfMemory (Requested size was 512 bytes.)

Thanks,

Pablo

P.S.: I also tried using the trtexectool to do the ONNX to engine Yolov8 conversion with the same OutOfMemory error. The command line is as follows. I also tried smaller and larger memory pool sizes without impact in the error.
/usr/src/tensorrt/bin/trtexec
–onnx=/home/intelliview/git/hvr7/SmrtHVR/3rdParty/YoloData/Sources/yolov8n_1_3_640_640.onnx
–saveEngine=yolov8n_1_3_640_640_jp62.engine
–fp16
–memPoolSize=workspace:512M

Are the GPU memory and the trtexec memPoolSizerequested related this way when I lowered it to 128MB? jtop reports much higher GPU memory, ~420 MB, and it is at around 520MB when it hits the error and stops.

The memory problems were likely due to fragmentation, although I am less certain of their origin; it did not matter how small I made the batch sizes.

I cleaned up the memory by doing a reboot. Then I stopped and uninstalled our C++ back-end programs. Rebooted the machine again. Updated the C++ code to generate YoloV8 engine files, rebuilt the C++ services with OpenCV 4.10 without opencv_world, and started testing.

I can run our TensorRTGenerator C++ program with TensorRT 10.3, which means I can create engine files using FP16 weights from their ONNX file version. I created Yolo8 and Yolo11 engine files.

I was hopeful and ran our application to find the same error I saw a couple of days ago:

LLVM ERROR: out of memory

The engine files load without errors. But within one second of program execution, the first error and service crash happen. Very little information is available as to where it is happening. I will enable core dumps. Even in debug, the program is crashing. The operating system sends the SIGABRT signal. This same code runs on JetPack 5.1.2.
Can I downgrade to a more stable JetPack 6.x?

Pablo

@DaneLLL I would consider this question closed because the subject was building an OpenCV with CUDA support in JetPack 6.2.

There are two potential sources for the LLVM OOM error I was observing:

  1. Potential misalignment in TensorRT and CUDA versions used to create engine files and the actual versions found at runtime
  2. Errors in the lifetime of the runtime (IRuntime) and engine (ICudaEngine) instances in the TensortRTGenerator program we use. The runtime must outlive the engine during the file creation; LLVM could experience memory leaks later when executing the file if this is not done correctly.

After I fixed all of these pieces, I still have remaining fires to solve in our application. Work will continue for a few days until everything works the same or better than in the previous JetPack.

So, to close this post. OpenCV 4.10 was built according to the suggested script. We removed support for openvc_world, but using pkg-config was straightforward. I don’t see core dumps any more, however, one of our applications is still stopping, but now it does it gracefully, without memory corruption. All our C++ builds and runs to the point I can dig deeper into migration problems that I expect as a normal part of the process.

I will post any findings if they are relevant to this forum under a new topic.

Thanks for the guidance,

Pablo

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.