OpenCV 4.9.0 Build with CUDA Failed on AGX Orin (JetPack 6.1) with Previously Provided Script

thomasc12 · November 12, 2024, 9:25pm

First of all, I hereby declare that I have already read Build opencv with cuda for Jetson AGX Orin but failed and 如何使用GPU版OpenCV？ The solution to the problem seems very simple - just run the shell script provided.

However, the shell script that’s provided as the answers to the above 2 posts did not work for me. The make process produced an error at 20% progress.

7 errors detected in the compilation of “/home/spt/Downloads/workspace/opencv_contrib-4.9.0/modules/cudaarithm/src/cuda/minmax.cu”.
CMake Error at cuda_compile_1_generated_minmax.cu.o.RELEASE.cmake:282 (message):
Error generating file
/home/spt/Downloads/workspace/opencv-4.9.0/release/modules/cudaarithm/CMakeFiles/cuda_compile_1.dir/src/cuda/./cuda_compile_1_generated_minmax.cu.o

17 errors detected in the compilation of “/home/spt/Downloads/workspace/opencv_contrib-4.9.0/modules/cudaarithm/src/cuda/minmaxloc.cu”.
CMake Error at cuda_compile_1_generated_minmaxloc.cu.o.RELEASE.cmake:282 (message):
Error generating file
/home/spt/Downloads/workspace/opencv-4.9.0/release/modules/cudaarithm/CMakeFiles/cuda_compile_1.dir/src/cuda/./cuda_compile_1_generated_minmaxloc.cu.o

I have attached the full log here.

opencv4.9.0_install.log (1.2 MB)

Please kindly point out if there is a missing piece before running the script. I was under the impression that the script is a self-contained push-button script. The hardware that I am using is the Jetson AGX Orin 64GB Dev Kit.

AastaLLL · November 13, 2024, 2:34am

Hi,

Thanks for reporting this.
We will give it a try and provide more info to you later.

Thanks.

AastaLLL · November 13, 2024, 6:04am

Hi,

Please try the below script again.
https://github.com/AastaNV/JEP/blob/master/script/install_opencv4.10.0_Jetpack6.1.sh

We have confirmed it can build and run as expected:

$ python3
Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> cv2.__version__
'4.10.0'
>>> cv2.cuda.printCudaDeviceInfo(0)
*** CUDA Device Query (Runtime API) version (CUDART static linking) *** 

Device count: 1

Device 0: "Orin"
  CUDA Driver Version / Runtime Version          12.60 / 12.60
  CUDA Capability Major/Minor version number:    8.7
  Total amount of global memory:                 62841 MBytes (65893351424 bytes)
  GPU Clock Speed:                               1.30 GHz
  Max Texture Dimension Size (x,y,z)             1D=(131072), 2D=(131072,65536), 3D=(16384,16384,16384)
  Max Layered Texture Size (dim) x layers        1D=(32768) x 2048, 2D=(32768,32768) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           0 / 0
  Compute Mode:
      Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) 

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version  = 12.60, CUDA Runtime Version = 12.60, NumDevs = 1

>>>

Thanks.

thomasc12 · November 13, 2024, 6:36pm

Thanks, @AastaLLL I can confirm that the script ran into completion without issue this time. I noticed that the change in the shell script was minimal - OpenCV version went from 4.9.0 to 4.10.0, and that was it. Does it mean JetPack 6.1 doesn’t support OpenCV 4.9.0? Is there a compatibility matrix somewhere that I can look up?

I went to the Python3 interpreter, but it couldn’t find any CUDA device:

$ python3
Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> cv2.__version__
'4.10.0'
>>> cv2.cuda.printCudaDeviceInfo(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
cv2.error: OpenCV(4.10.0) /io/opencv/modules/core/include/opencv2/core/private.cuda.hpp:106: error: (-216:No CUDA support) The library is compiled without CUDA support in function 'throw_no_cuda'

I took a closer look at your shell script, and found that the line 61 seems problematic.

echo ‘export PYTHONPATH=/usr/local/lib/python3.10/site-packages/:$PYTHONPATH’ >> ~/.bashrc

In my AGX Orin, there is actually no such path:

$ ls /usr/local/lib/python3.10
dist-packages

Under the dist-packages subfolder, I was able to locate cv2:

$ ls /usr/local/lib/python3.10/dist-packages/
cv2 jetson_stats-4.2.12.dist-info jtop numpy numpy-1.26.1.dist-info numpy.libs smbus2 smbus2-0.5.0.dist-info

I modified the $PYTHONPATH variable by replacing site-packages with dist-packages in my ~/.bashrc. After re-sourcing the .bashrc script, I was able to see CUDA device info in Python3, just like what you showed.

Should the line 61 in the shell script be modified?

AastaLLL · November 14, 2024, 5:55am

Hi,

Thanks a lot. We will give it a check.

JetPack 6.1 uses CUDA 12.6 which requires a commit to support CUDA>12.4 so we move the version to 4.10.0 to get the fix.

github.com/opencv/opencv_contrib

CUDA Toolkit 12.4.0 `tuple` incompatibility

opened 06:41PM - 08 Mar 24 UTC

closed 12:47PM - 30 May 24 UTC

runer112

bug category: build/install category: cuda

##### System information (version)  - OpenCV => 4.9.0 - Operating System / Platform => Windows 64 Bit - Compiler => Visual Studio 2022 ##### Detailed description opencv with CUDA support cannot be built using CUDA Toolkit 12.4.0. While CUDA Toolkit 12.3.2 uses thrust version 2.2.0 (https://docs.nvidia.com/cuda/archive/12.3.2/cuda-toolkit-release-notes/index.html), CUDA Toolkit 12.4.0 updates to thrust version 2.3.1 (https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html). In thrust version 2.3.0, the tuple implementation was replaced with a standard tuple implementaton (https://github.com/NVIDIA/cccl/pull/262). Notably, this changes the definition from a 10-parameter template to a variable-parameter template. So instead of a tuple of _n_ items being padded out with _10 - n_ null types to always have 10 template parameters, it now only has _n_ template parameters. This makes the function templates in cudev specified with 10 template parameters per tuple no longer viable for tuples not of size 10. An example of one such function template that's no longer viable, `cv::cudev::blockReduce`: https://github.com/opencv/opencv_contrib/blob/6b5142ff657ca676ab35233556b49a532e75e2b7/modules/cudev/include/opencv2/cudev/block/reduce.hpp#L68-L81 An example of an error I encounter: ``` [build] Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp(379): error : no instance of overloaded function "cv::cudev::blockReduce" matches the argument list [Z:\dev\1\opencv\out\build\user\modules\world\opencv_world.vcxproj] [build] argument types are: (cuda::std::__4::tuple<volatile int *, volatile int *>, cuda::std::__4::tuple<int &, int &>, int, cuda::std::__4::tuple<cv::cudev::minimum<int>, cv::cudev::maximum<int>>) [build] blockReduce<BLOCK_SIZE>(smem_tuple(sminval, smaxval), tie(mymin, mymax), tid, make_tuple(minOp, maxOp)); [build] ^ [build] Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/block/reduce.hpp(72): note #3327-D: candidate function template "cv::cudev::blockReduce<N,P0,P1,P2,P3,P4,P5,P6,P7,P8,P9,R0,R1,R2,R3,R4,R5,R6,R7,R8,R9,Op0,Op1,Op2,Op3,Op4,Op5,Op6,Op7,Op8,Op9>(const thrust::THRUST_200301_500_520_600_610_700_750_800_860_890_900_NS::tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> &, const thrust::THRUST_200301_500_520_600_610_700_750_800_860_890_900_NS::tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9> &, uint, const thrust::THRUST_200301_500_520_600_610_700_750_800_860_890_900_NS::tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9> &)" failed deduction [build] __declspec(__device__) __forceinline void blockReduce(const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem, [build] ^ [build] Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/block/reduce.hpp(63): note #3327-D: candidate function template "cv::cudev::blockReduce<N,T,Op>(volatile T *, T &, uint, const Op &)" failed deduction [build] __declspec(__device__) __forceinline void blockReduce(volatile T* smem, T& val, uint tid, const Op& op) [build] ^ [build] detected during: [build] instantiation of "void cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, src_type, work_type>::reduceGrid<BLOCK_SIZE>(work_type *, int) [with src_type=uchar, work_type=int, BLOCK_SIZE=256]" at line 412 [build] instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,BLOCK_SIZE,PATCH_X,PATCH_Y,SrcPtr,ResType,MaskPtr>(SrcPtr, ResType *, MaskPtr, int, int) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, int>, BLOCK_SIZE=256, PATCH_X=4, PATCH_Y=4, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=int, MaskPtr=cv::cudev::WithOutMask]" at line 421 [build] instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, int>, Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=int, MaskPtr=cv::cudev::WithOutMask]" at line 460 [build] instantiation of "void cv::cudev::grid_reduce_detail::minMaxVal<Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=int, MaskPtr=cv::cudev::WithOutMask]" at line 206 of Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/grid/reduce.hpp [build] instantiation of "void cv::cudev::gridFindMinMaxVal_<Policy,SrcPtr,ResType>(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=int]" at line 349 of Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/grid/reduce.hpp [build] instantiation of "void cv::cudev::gridFindMinMaxVal(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=int]" at line 68 of Z:\dev\1\opencv_contrib\modules\cudaarithm\src\cuda\minmax.cu [build] instantiation of "void <unnamed>::minMaxImpl<T,R>(const cv::cuda::GpuMat &, const cv::cuda::GpuMat &, cv::cuda::GpuMat &, cv::cuda::Stream &) [with T=uchar, R=int]" at line 92 of Z:\dev\1\opencv_contrib\modules\cudaarithm\src\cuda\minmax.cu ``` The first candidate but nonviable function template shown in the error message is the one linked above, which was viable and selected in previous CUDA Toolkit versions. I think that all templates specifying 10 template parameters per tuple can be updated to work with the new tuple definition by replacing each set of 10 template parameters with a parameter pack. I think this should still be compatible with the old tuple definition, as well. For example, I think this would be a viable implementation of `cv::cudev::blockReduce`: ```cpp template <int N, typename... P, typename... R, class... Op> __device__ __forceinline__ void blockReduce(const tuple<P...>& smem, const tuple<R...>& val, uint tid, const tuple<Op...>& op) { block_reduce_detail::Dispatcher<N>::reductor::template reduce< const tuple<P...>&, const tuple<R...>&, const tuple<Op...>&>(smem, val, tid, op); } ``` ##### Steps to reproduce Attempt to build cudev using CUDA Toolkit 12.4.0. I suspect that this error will be observed with any combination of OpenCV version, OS, platform, and compiler (that are modern enough to not encounter some other error first). ##### Issue submission checklist - [x] I report the issue, it's not a question  - [x] I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution  - [x] I updated to the latest OpenCV version and the issue is still there  - [x] There is reproducer code and related data files: videos, images, onnx, etc

Thanks.

thomasc12 · November 26, 2024, 5:26pm

Hi,

Did you give it a check?

khanh.nguyen1 · November 29, 2024, 10:05am

I’ve also successfully install opencv4.10.0 using your script and confirm it can build and run expected.

My question is: if I want to run the opencv in venv then I have to activate the venv and running your script again - right? Is there any other way?

Thanks, Khanh

khanh.nguyen1 · November 29, 2024, 11:06am

Never mind. I created a new virtual environment with the option --system-site-packages and now can use opencv within venv as well.

Thanks, Khanh

system · December 13, 2024, 11:06am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Failure C++ exception with description "OpenCV(4.8.0) /home/ubuntu/opencv_build/opencv/modules/core/include/opencv2/core/private.cuda.hpp: Jetson AGX Orin opencv , cuda	5	82	September 9, 2024
Cmake issue for opencv with Jetson Orin AGX Jetson AGX Orin cuda , ubuntu	2	60	March 18, 2025
Build Python OpenCV2 with CUDA support for Jetpack 6.0? Jetson AGX Orin opencv , cuda , python	5	2637	May 6, 2024
OpenCV CUDA support? Jetson AGX Xavier opencv	20	5312	October 18, 2021
Xavier NX (Jetpack 4.5) - Building OpenCV with CUDA but then ... no CUDA? Jetson Xavier NX opencv	5	2872	October 18, 2021
How to install Opencv 4.0 on Jetson TX2 with jetpack 4.2 Jetson TX2 opencv	28	15386	October 18, 2021
Unable to install opencv with CUDA in Jetson Nano Jetson Nano	30	13864	October 18, 2021
Error while installing OpenCV 4.8.0 on Jetson Orion NX(16 gb) - Jetpack 6.0 - L4T 36.3.0 Jetson Orin NX opencv	6	490	July 17, 2024
Jetpack 4.4 Broke one of my programs Jetson Nano cudnn	24	3181	October 18, 2021
Build opencv with cuda for Jetson AGX Orin but failed Jetson AGX Orin opencv , cuda	3	962	July 17, 2024

OpenCV 4.9.0 Build with CUDA Failed on AGX Orin (JetPack 6.1) with Previously Provided Script

Related topics