Trivial cuFFT causes cuda-memcheck errors on RTX 2070 SUPER

We have a rather complicated simulation application that uses CUDA 10.1 including cuFFT library running under Windows 10 Pro 64-bit using WDDM mode. We got a new dual-GPU Alienware Auro R9 with x2 RTX 2070 SUPER added to our internal list of test machines. Ran a test under cuda-memcheck and started getting random illegal-instruction/illegal-addresses. After continuing to test & isolate, it appeared to be coming from the cuFFT layer: after creating a 2D FFT plan of 800x800 (C2C) any further kernel execution resulted in illegal-instruction/illegal-address reports from cuda-memcheck. Note we didn’t even get to executing the plan, just creating it caused issues.

Couldn’t make progress with resolving it in our complicated application so tried a simple test program under cuda-memcheck on this same box and am unable to create a 2D plan without cuda-mem-check failing with an internal error (see below). I am able to create/destroy a cufft handle, but if I try to plan anything it flags internal errors.

We have SDK 10.1 and 11.0 installed and have built/debugged/memcheck’d with both – no differences seen there. Driver 451.48 (DCH) with Windows Version1909 (OS Build 18363.900).

Has anyone seen this before? Any advice?

Sample code that fails: replace cufftPlan2d with cufftCreate() and the internal error goes away.


cudaSetDevice(0);
cufftHandle plan = 0;
cufftPlan2d(&plan, 256, 256, CUFFT_C2C);//512x512, 800x800, etc. all show same internal error…
cufftDestroy(plan);

========= Internal Memcheck Error: Initialization failed
========= Saved host backtrace up to driver entry point at error
========= Host Frame:C:\Windows\system32\DriverStore\FileRepository\nvddi.inf_amd64_d270e5eea12c358c\nvcuda64.dll (cuProfilerStop + 0x904ce) [0x2ae04e]
========= Host Frame:C:\Windows\system32\DriverStore\FileRepository\nvddi.inf_amd64_d270e5eea12c358c\nvcuda64.dll (cuProfilerStop + 0x91f50) [0x2afad0]
========= Host Frame:C:\Windows\system32\DriverStore\FileRepository\nvddi.inf_amd64_d270e5eea12c358c\nvcuda64.dll [0x7fdd0]
========= Host Frame:C:\Windows\system32\DriverStore\FileRepository\nvddi.inf_amd64_d270e5eea12c358c\nvcuda64.dll [0x85f2e]
========= Host Frame:C:\Windows\system32\DriverStore\FileRepository\nvddi.inf_amd64_d270e5eea12c358c\nvcuda64.dll (cuProfilerStop + 0x11473a) [0x3322ba]
========= Host Frame:C:\Windows\system32\DriverStore\FileRepository\nvddi.inf_amd64_d270e5eea12c358c\nvcuda64.dll [0x1892ba]
========= Host Frame:C:\Windows\system32\DriverStore\FileRepository\nvddi.inf_amd64_d270e5eea12c358c\nvcuda64.dll [0xa6d5b]
========= Host Frame:C:\Windows\system32\DriverStore\FileRepository\nvddi.inf_amd64_d270e5eea12c358c\nvcuda64.dll [0xa6eb7]
========= Host Frame:C:\Windows\system32\DriverStore\FileRepository\nvddi.inf_amd64_d270e5eea12c358c\nvcuda64.dll [0xa7a0c]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll [0x65b3]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll [0xe1e9]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll [0x54ab]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll [0x8519]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll [0x8181]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll [0xe973]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll [0x14eb]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll [0x328a]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll [0x7b74e]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll [0x7c9b3]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll [0x7c33d]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll (cufftXtMakePlanMany + 0x35a) [0x871aa]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll (cufftMakePlanMany64 + 0xb8) [0x87db8]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll (cufftMakePlanMany + 0x1af) [0x85ddf]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll (cufftPlanMany + 0xd2) [0x86122]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\cufft64_10.dll (cufftPlan1d + 0x54) [0x85e64]
========= Host Frame:C:\ii-vi\svn\wonat_development\wonat\src\Native_Testing\GPU_Testing\bin\x64\Release\GPU_Testing.exe (main + 0xd0) [0x11b0]
========= Host Frame:C:\ii-vi\svn\wonat_development\wonat\src\Native_Testing\GPU_Testing\bin\x64\Release\GPU_Testing.exe (__scrt_common_main_seh + 0x10c) [0x1554]
========= Host Frame:C:\Windows\System32\KERNEL32.DLL (BaseThreadInitThunk + 0x14) [0x17bd4]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x6ce51]

========= LEAK SUMMARY: 0 bytes leaked in 0 allocations
========= ERROR SUMMARY: 1 error

Cmd line (built w/ VS2017):


cuda-memcheck.exe --force-blocking-launches yes --leak-check full --racecheck-report all --report-api-errors all --check-deprecated-instr yes --print-level info --tool memcheck %BIN%

Bug #:


Bug and supporting details filed as bug # 3050936

Thanks to @ RaulPPelaez

Work-around:


CUDA_MEMCHECK_PATCH_MODULE=1

Reference:


https://docs.nvidia.com/cuda/cuda-memcheck/index.html

The CUDA-MEMCHECK tools can fail to initialize when there are a lot of CUDA functions in the target app. This is due to CUDA-MEMCHECK trying to find a subset of functions to patch and running out of memory. The environment variable CUDA_MEMCHECK_PATCH_MODULE can be set to 1 in order to bypass this behavior, thus resolving the initialization error.

1 Like