Program hit cudaErrorIllegalAddress (error 700) [...] on CUDA API call to cudaDeviceSynchronize

xxliolauxx · June 2, 2021, 1:47pm

Hi there

My CUDA program crashes consistently for large inputs and occasinally for small ones.
I used CUDA-MEMCHECK to look for out-of-bounds memory accesses and fixed the ones I found.
I am still getting crashes however, CUDA-MEMCHECK reports them occuring inside cudaDeviceSynchronize, Nsight reports the (same) error in cuCtxSynchronize.
I’ve run out of debugging options, so I’d be very happy for any advice on how to debug this.

Thanks,
Joel

Full CUDA-MEMCHECK output:
========= CUDA-MEMCHECK
PASSED ebs_copy_test
PASSED ebs_num_test
Allocating Memory…
Initializing Reference Sequence…
Allocating Memory (171B) for 9 Reads
Initializing Reads…
Starting Kernel…
========= Error: process didn’t terminate successfully
========= The application may have hit an error when dereferencing Unified Memory from the host. Please rerun the application under a host debugger to catch such errors.
========= Program hit cudaErrorIllegalAddress (error 700) due to “an illegal memory access was encountered” on CUDA API call to cudaDeviceSynchronize.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:C:\Windows\system32\DriverStore\FileRepository\nvmdi.inf_amd64_b5c7e9f1cc7d29c6\nvcuda64.dll (cuProfilerStop + 0x9da58) [0x2ccdb8]
========= Host Frame:C:\Windows\system32\DriverStore\FileRepository\nvmdi.inf_amd64_b5c7e9f1cc7d29c6\nvcuda64.dll (cuProfilerStop + 0xa011a) [0x2cf47a]
========= Host Frame:C:\Windows\system32\DriverStore\FileRepository\nvmdi.inf_amd64_b5c7e9f1cc7d29c6\nvcuda64.dll [0x8035e]
========= Host Frame:C:\Windows\system32\DriverStore\FileRepository\nvmdi.inf_amd64_b5c7e9f1cc7d29c6\nvcuda64.dll (cuProfilerStop + 0x1229fa) [0x351d5a]
========= Host Frame:C:\Windows\system32\DriverStore\FileRepository\nvmdi.inf_amd64_b5c7e9f1cc7d29c6\nvcuda64.dll (cuProfilerStop + 0x13db82) [0x36cee2]
========= Host Frame:C:\Users\joel\source\repos\genasm-gpu\genasm_gpu.exe (cudart::cudaApiChooseDevice + 0x41) [0x18e1]
========= Host Frame:C:\Users\joel\source\repos\genasm-gpu\genasm_gpu.exe (cudart::cudaApiStreamEndCapture_ptsz + 0x33) [0x10703]
========= Host Frame:C:\Users\joel\source\repos\genasm-gpu\genasm_gpu.exe (cudaGetErrorName + 0x15) [0x18305]
========= Host Frame:C:\Users\joel\source\repos\genasm-gpu\genasm_gpu.exe (cudaGraphExecKernelNodeSetParams + 0x3) [0x1c2d3]
========= Host Frame:C:\Users\joel\source\repos\genasm-gpu\genasm_gpu.exe (cudaHostAlloc + 0x124) [0x20514]
========= Host Frame:C:\Windows\System32\KERNEL32.DLL (BaseThreadInitThunk + 0x14) [0x17034]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x52651]
=========
========= No CUDA-MEMCHECK results found

achartiernv · September 27, 2021, 5:35pm

Hi, could you provide the source code for this issue?

xxliolauxx · September 27, 2021, 7:43pm

Hi
Thanks for the reply, I completely forgot about this post…

Through trial and error(s) I eventually figured out it was an out of bounds access by a GPU kernel into a large block (2GB) of unified memory. My best guess is that MEMCHECK cannot deal with such large memory blocks, since above error message is not that helpful of course.

If helpful I can provide the corresponding version of the source code, since this is part of an unpublished research work I cannot do this publicly (yet).

Thanks,
Joel

achartiernv · September 29, 2021, 8:45pm

If you get a chance, please try the compute-sanitizer tool as a drop-in replacement for cuda-memcheck.

xxliolauxx · September 29, 2021, 8:51pm

Will do, thanks!

Topic		Replies	Views
cudaDeviceSynchronize always reports an error “ an illegal memory access was encountered”. Why and what I can do ? Thanks CUDA-MEMCHECK cuda	2	1686	April 11, 2022
cuda-memcheck.exe caused an illegal memory access error. CUDA-MEMCHECK	0	1856	October 27, 2016
cuda-memcheck error: Address is out of bounds. CUDA Programming and Performance	2	5881	November 12, 2012
Tracking Invalid read size and illegal memory access CUDA Programming and Performance	3	7826	May 24, 2016
Using cudaMemCheck Legacy PGI Compilers	3	6556	November 21, 2013
Potential Bug, cuda-memcheck can someone verify? Program crashing on GPU initialisation with cuda-me CUDA Programming and Performance	11	3613	April 24, 2020
Memcheck CUDA Programming and Performance	2	593	July 20, 2017
Illegal memory access with unified memory CUDA Programming and Performance cuda	4	810	June 13, 2023
Incidental error 700 - an illegal memory access is encountered CUDA Programming and Performance cuda	5	9422	March 25, 2021
cuda-memcheck reports errors when unified memory is allocated CUDA-MEMCHECK	0	1755	October 27, 2015

Program hit cudaErrorIllegalAddress (error 700) [...] on CUDA API call to cudaDeviceSynchronize

Related topics