I am using Win10 and Cuda 11.8 and VS 2019 C++.
I have a small command line app that uses cuda kernels and thrust::inclusive_scan. Everything works fine in debug mode.
I cut and paste the thrust::inclusive_scan into my 300,000 line C++ program.
In debug mode at runtime I get,
CUDA error 2 [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\cub\device\dispatch/dispatch_scan.cuh, 372]: out of memory
I am performing the scan on a 200kb buffer.
Here is the funky bit.
When I compile the code in my stand alone program it compiles fine.
When I compile the code in my 300,000 line C++ program I get these warning messages
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\thrust/detail/alignment.h(139): warning C4324: ‘thrust::detail::aligned_type<2>::type’: structure was padded due to alignment specifier
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\thrust/detail/alignment.h(140): warning C4324: ‘thrust::detail::aligned_type<4>::type’: structure was padded due to alignment specifier
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\thrust/detail/alignment.h(141): warning C4324: ‘thrust::detail::aligned_type<8>::type’: structure was padded due to alignment specifier
I verified that both source files in both VS solutions are compiled the same way.
I am running on an A100-40gb card.
Any ideas why I am crashing at runtime?
Any ideas why the compiler warnings are present?
–Bob