<!--
If you have a question rather than reporting a bug please go to https://fo…rum.opencv.org where you get much faster responses.
If you need further assistance please read [How To Contribute](https://github.com/opencv/opencv/wiki/How_to_contribute).
This is a template helping you to create an issue which can be processed as quickly as possible. This is the bug reporting section for the OpenCV library.
-->
##### System information (version)
<!-- Example
- OpenCV => 4.2
- Operating System / Platform => Windows 64 Bit
- Compiler => Visual Studio 2017
-->
- OpenCV => 4.9.0
- Operating System / Platform => Windows 64 Bit
- Compiler => Visual Studio 2022
##### Detailed description
opencv with CUDA support cannot be built using CUDA Toolkit 12.4.0.
While CUDA Toolkit 12.3.2 uses thrust version 2.2.0 (https://docs.nvidia.com/cuda/archive/12.3.2/cuda-toolkit-release-notes/index.html), CUDA Toolkit 12.4.0 updates to thrust version 2.3.1 (https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html). In thrust version 2.3.0, the tuple implementation was replaced with a standard tuple implementaton (https://github.com/NVIDIA/cccl/pull/262). Notably, this changes the definition from a 10-parameter template to a variable-parameter template. So instead of a tuple of _n_ items being padded out with _10 - n_ null types to always have 10 template parameters, it now only has _n_ template parameters. This makes the function templates in cudev specified with 10 template parameters per tuple no longer viable for tuples not of size 10.
An example of one such function template that's no longer viable, `cv::cudev::blockReduce`:
https://github.com/opencv/opencv_contrib/blob/6b5142ff657ca676ab35233556b49a532e75e2b7/modules/cudev/include/opencv2/cudev/block/reduce.hpp#L68-L81
An example of an error I encounter:
```
[build] Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp(379): error : no instance of overloaded function "cv::cudev::blockReduce" matches the argument list [Z:\dev\1\opencv\out\build\user\modules\world\opencv_world.vcxproj]
[build] argument types are: (cuda::std::__4::tuple<volatile int *, volatile int *>, cuda::std::__4::tuple<int &, int &>, int, cuda::std::__4::tuple<cv::cudev::minimum<int>, cv::cudev::maximum<int>>)
[build] blockReduce<BLOCK_SIZE>(smem_tuple(sminval, smaxval), tie(mymin, mymax), tid, make_tuple(minOp, maxOp));
[build] ^
[build] Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/block/reduce.hpp(72): note #3327-D: candidate function template "cv::cudev::blockReduce<N,P0,P1,P2,P3,P4,P5,P6,P7,P8,P9,R0,R1,R2,R3,R4,R5,R6,R7,R8,R9,Op0,Op1,Op2,Op3,Op4,Op5,Op6,Op7,Op8,Op9>(const thrust::THRUST_200301_500_520_600_610_700_750_800_860_890_900_NS::tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> &, const thrust::THRUST_200301_500_520_600_610_700_750_800_860_890_900_NS::tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9> &, uint, const thrust::THRUST_200301_500_520_600_610_700_750_800_860_890_900_NS::tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9> &)" failed deduction
[build] __declspec(__device__) __forceinline void blockReduce(const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
[build] ^
[build] Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/block/reduce.hpp(63): note #3327-D: candidate function template "cv::cudev::blockReduce<N,T,Op>(volatile T *, T &, uint, const Op &)" failed deduction
[build] __declspec(__device__) __forceinline void blockReduce(volatile T* smem, T& val, uint tid, const Op& op)
[build] ^
[build] detected during:
[build] instantiation of "void cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, src_type, work_type>::reduceGrid<BLOCK_SIZE>(work_type *, int) [with src_type=uchar, work_type=int, BLOCK_SIZE=256]" at line 412
[build] instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,BLOCK_SIZE,PATCH_X,PATCH_Y,SrcPtr,ResType,MaskPtr>(SrcPtr, ResType *, MaskPtr, int, int) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, int>, BLOCK_SIZE=256, PATCH_X=4, PATCH_Y=4, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=int, MaskPtr=cv::cudev::WithOutMask]" at line 421
[build] instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, int>, Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=int, MaskPtr=cv::cudev::WithOutMask]" at line 460
[build] instantiation of "void cv::cudev::grid_reduce_detail::minMaxVal<Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=int, MaskPtr=cv::cudev::WithOutMask]" at line 206 of Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/grid/reduce.hpp
[build] instantiation of "void cv::cudev::gridFindMinMaxVal_<Policy,SrcPtr,ResType>(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=int]" at line 349 of Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/grid/reduce.hpp
[build] instantiation of "void cv::cudev::gridFindMinMaxVal(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=int]" at line 68 of Z:\dev\1\opencv_contrib\modules\cudaarithm\src\cuda\minmax.cu
[build] instantiation of "void <unnamed>::minMaxImpl<T,R>(const cv::cuda::GpuMat &, const cv::cuda::GpuMat &, cv::cuda::GpuMat &, cv::cuda::Stream &) [with T=uchar, R=int]" at line 92 of Z:\dev\1\opencv_contrib\modules\cudaarithm\src\cuda\minmax.cu
```
The first candidate but nonviable function template shown in the error message is the one linked above, which was viable and selected in previous CUDA Toolkit versions.
I think that all templates specifying 10 template parameters per tuple can be updated to work with the new tuple definition by replacing each set of 10 template parameters with a parameter pack. I think this should still be compatible with the old tuple definition, as well. For example, I think this would be a viable implementation of `cv::cudev::blockReduce`:
```cpp
template <int N, typename... P, typename... R, class... Op>
__device__ __forceinline__ void blockReduce(const tuple<P...>& smem,
const tuple<R...>& val,
uint tid,
const tuple<Op...>& op)
{
block_reduce_detail::Dispatcher<N>::reductor::template reduce<
const tuple<P...>&,
const tuple<R...>&,
const tuple<Op...>&>(smem, val, tid, op);
}
```
##### Steps to reproduce
Attempt to build cudev using CUDA Toolkit 12.4.0. I suspect that this error will be observed with any combination of OpenCV version, OS, platform, and compiler (that are modern enough to not encounter some other error first).
##### Issue submission checklist
- [x] I report the issue, it's not a question
<!--
OpenCV team works with forum.opencv.org, Stack Overflow and other communities
to discuss problems. Tickets with questions without a real issue statement will be
closed.
-->
- [x] I checked the problem with documentation, FAQ, open issues,
forum.opencv.org, Stack Overflow, etc and have not found any solution
<!--
Places to check:
* OpenCV documentation: https://docs.opencv.org
* FAQ page: https://github.com/opencv/opencv/wiki/FAQ
* OpenCV forum: https://forum.opencv.org
* OpenCV issue tracker: https://github.com/opencv/opencv/issues?q=is%3Aissue
* Stack Overflow branch: https://stackoverflow.com/questions/tagged/opencv
-->
- [x] I updated to the latest OpenCV version and the issue is still there
<!--
master branch for OpenCV 4.x and 3.4 branch for OpenCV 3.x releases.
OpenCV team supports only the latest release for each branch.
The ticket is closed if the problem is not reproduced with the modern version.
-->
- [x] There is reproducer code and related data files: videos, images, onnx, etc
<!--
The best reproducer -- test case for OpenCV that we can add to the library.
Recommendations for media files and binary files:
* Try to reproduce the issue with images and videos in opencv_extra repository
to reduce attachment size
* Use PNG for images, if you report some CV related bug, but not image reader
issue
* Attach the image as an archive to the ticket, if you report some reader issue.
Image hosting services compress images and it breaks the repro code.
* Provide ONNX file for some public model or ONNX file with random weights,
if you report ONNX parsing or handling issue. Architecture details diagram
from netron tool can be very useful too. See https://lutzroeder.github.io/netron/
-->