I just started using PGI and OpenACC recently, and I’m working on accelerating a fairly large program that involves a lot of function calls. I frequently have been encountering these two runtime errors when running on accelerators:
call to cuMemcpyDtoHAsync returned error 700: Illegal address during kernel execution
call to cuMemFreeHost returned error 700: Illegal address during kernel execution
Instead of asking for help for a specific code segment, I was wondering if there’s a way to debug the causes of these errors in general. I’ve tried using PGI debugger, but it always stops and resets the program once it crashes so I’m unable to locate the point in the code causing the problem. So far I’ve only been able to use guess-and-check to resolve these errors, and without debugging software it’s taking a very long time just finding their causes due to the size of the code.
You can use cuda-gdb to debug the device code. Just compile with “-g” and you’ll be able to step into the generated CUDA kernels. Though, cuda-gdb doesn’t understand Fortran. You can still use it with Fortran, but some information might be a off.
As for the actual “Illegal address” error, this is similar to a host segmentation violation where the device code is accessing an illegal address. It can be caused by things like out-of-bounds array accesses, null pointers, host addresses being used on the device, etc.
Some other things to try are:
- Build the program without OpenACC and then run under “valgrind” (http://www.valgrind.org) to detect any out-of-bounds errors or other memory issues.
- Compile the OpenACC regions to target the CPUs (-ta=multicore). You can then run this code in pgdbg to track down any issues with the parallel code.
- Compile with CUDA Unified Memory (-ta=tesla:managed). This will put all dynamic memory allocations in an address space accessible from both the host and device. If the program works, then it’s an indication that the code is accessing a host address.
- Set the environment variable PGI_ACC_NOTIFY=1 (or the more verbose PGI_ACC_DEBUG=1). This will print out all the kernel calls made to the device and help track down which one is failing.
Hope this helps,