Memcheck error, dynamic parallelism and address

Win10, GTX 1050 Ti, CUDA 12.1, VS 2017
part of error messages I post below
my questions are:

  1. the 481th line of codebook.cu is loop << <grid_size, REDUCE_ADD_WIDTH, REDUCE_ADD_WIDTH * sizeof(uint64_t) >> >, so the invalid global write located inside the global function “loop”?
  2. “Address 0x180080bd60 is out of bounds” means the program tried to write to 0x180080bd60? If so, how to find who tried to write to the address
  3. As for error 719 from cudaMemcpy and cudaFree, it only occures in memcheck, otherwise no error is reported. So is this really an error, or something about tdrdelay?
========= Invalid __global__ write of size 4 bytes
=========     at 0x13c8 in G:/cuda/codebook.cu:481:solution(etc1s_optimizer_state_tag *, int, unsigned int, const rgba *, const unsigned int *, etc1s_optimizer_solution_coordinates_tag, etc1s_optimizer_potential_solution_tag *, etc1s_optimizer_potential_solution_tag *)
=========     by thread (2,0,0) in block (36,0,0)
=========     Address 0x180080bd60 is out of bounds
=========     and is 47,969 bytes after the nearest allocation at 0x1800800000 of size 512 bytes
=========     Device Frame:G:/cuda/codebook.cu:667:fit(unsigned int, etc1s_optimizer_state_tag *, encode_etc1s_param_struct_tag, unsigned int, const rgba *, const unsigned int *) [0xf90]
=========     Device Frame:G:/cuda/codebook.cu:740:cluster(encode_etc1s_param_struct_tag, const pixel_cluster_tag *, const rgba *, const unsigned int *, etc_block_tag *, int) [0x978]
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:cuEventRecordWithFlags [0x7ff972c54db8]
=========                in C:\Windows\system32\DriverStore\FileRepository\nv_dispui.inf_amd64_c3352d3df1cf4d8c\nvcuda64.dll
......
......
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaMemcpy.
......
......
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaFree.

really thanks.

  1. Compiling the code with -G or -lineinfo might provide a more precise location.
  2. Yes. See the third line for the thread/block causing the illegal access
  3. This is indirectly caused by the tool causing the kernel to abort as a result of the detected error.
1 Like

I’ve gotten around the “Invalid write” problem in a few ways, and will probably ask for help again, thank you very much for your help!

Thanks for letting us know! If there is new issue, please file a new topic and we will do our best to help !