Program received signal CUDA_EXCEPTION_10, Device Illegal Address.

Hello,

I am debugging a multi-GPU program which runs fine when only one GPU is engaged.
When multiple (2) GPU is enabled, I encounter below error in cuda-gdb. Once I am there I cannot switch focus, backtrace is short. Would you please give me some hint? Where are STORE4() and NO_REST_OF_K() defined? Does CUDA kernel 0 indicating the first kernel function registered from my program?

CUDA Exception: Device Illegal Address
The exception was triggered in device 1.

Program received signal CUDA_EXCEPTION_10, Device Illegal Address.
[Switching focus to CUDA kernel 0, grid 14, block (0,14,0), thread (0,0,0), device 1, sm 0, warp 0, lane 0]
0x00007f354b7648b0 in STORE4 ()
(cuda-gdb) bt
#0 0x00007f354b7648b0 in STORE4 ()
#1 0x00007f354b764490 in NO_REST_OF_K<<<(1,19,1),(16,16,1)>>> ()
(cuda-gdb) cuda block 1
Request cannot be satisfied. CUDA focus unchanged.
(cuda-gdb)

Hi, xunlei

Can you tell me what 2 GPU are you using ? Which toolkit version?
If possible, please also provide the minor sample.

Thanks!

Hi Veraj,
My machine has TESLA K40 and CUDA version is 8.0. The code works fine under single GPU mode. I suspect that I have a bug in context/device switch but I cannot see where. The code is compiled with cuda-memcheck flags

I am not sure how to get deeper stacks on the kernel focus.

CUDA Exception: Device Illegal Address
The exception was triggered in device 1.

Program received signal CUDA_EXCEPTION_10, Device Illegal Address.
[Switching focus to CUDA kernel 0, grid 14, block (0,14,0), thread (0,0,0), device 1, sm 0, warp 0, lane 0]
0x00007f95c722e5e0 in STORE4 ()
(cuda-gdb) bt
#0 0x00007f95c722e5e0 in STORE4 ()
#1 0x00007f95c722e1d0 in NO_REST_OF_K<<<(1,19,1),(16,16,1)>>> ()

info threads show one problematic thread

(cuda-gdb) thread 25
[Switching to thread 25 (Thread 0x7f962516c700 (LWP 40520))]
#0 0x00007f9656e7a2e4 in __lll_lock_wait () from /lib64/libpthread.so.0
(cuda-gdb) bt
#0 0x00007f9656e7a2e4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f9656e755a3 in _L_lock_892 () from /lib64/libpthread.so.0
#2 0x00007f9656e75487 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f961e2054aa in cudbgReportDriverInternalError () from /usr/lib64/nvidia/libcuda.so.1
#4 0x00007f961e2908dc in cudbgApiDetach () from /usr/lib64/nvidia/libcuda.so.1
#5 0x00007f961e3b3256 in cuVDPAUCtxCreate () from /usr/lib64/nvidia/libcuda.so.1
#6 0x00007f961e1b6276 in ?? () from /usr/lib64/nvidia/libcuda.so.1
#7 0x00007f961e1b64a3 in ?? () from /usr/lib64/nvidia/libcuda.so.1
#8 0x00007f961e2faf65 in cuLaunchKernel () from /usr/lib64/nvidia/libcuda.so.1
#9 0x00007f962431d23d in cudaGetExportTable ()
from /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudart.so.8.0
#10 0x00007f9624343783 in cudaLaunch ()
from /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudart.so.8.0
#11 0x00007f961eae9858 in cudaLaunch (
func=0x7f961eae838a <Kernel_Activate_DTanH(ActivationParms)> “UH\211\345H\215}\020\350y\222\377\377\311\303UH\211\345H\203\354\020H\211}\370H\213E\370H\211\307\350\201\024”)
at /usr/local/cuda/bin/…/targets/x86_64-linux/include/cuda_runtime.h:1819
#12 0x00007f961eae8385 in __device_stub__Z21Kernel_Activate_DTanH15ActivationParms (__par0=…)
at /tmp/tmpxft_00009abd_00000000-4_cudalib.compute_52.cudafe1.stub.c:1
#13 0x00007f961eae8397 in Kernel_Activate_DTanH (__cuda_0=…) at cudalib.cu:2619
#14 0x00007f961eae5e70 in Activate (parms=0x7f962515afc0, dummy=0x130c540000) at cudalib.cu:2983

And Kernel_Activate_DTanH() looks innocent enough to me.

global void Kernel_Activate_DTanH(ActivationParms p)
{
double* x = (double*)p.x;
int i = blockIdx.x * blockDim.x + threadIdx.x;

x[i] = tanh(x[i]);
x[i]*= p.dropOut_1;

}

Thank you for your help.

cuda-memcheck flags are added when compiling the code.

-Xcompiler -rdynamic -lineinfo

Hi, xunlei

You have a GPU is Tesla K40. What about another one ? Also K40? What’s your display GPU?

Also if you thought it is a programming bug, then suggest you post the problem into cuda programming section, there you can seek more help about programming details. Thanks !

Hi Veraj,

K40 has two GPU’s.
±----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K40m Off | 0000:08:00.0 Off | 0 |
| N/A 29C P0 60W / 235W | 0MiB / 11439MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla K40m Off | 0000:24:00.0 Off | N/A |
| N/A 30C P0 63W / 235W | 0MiB / 11439MiB | 54% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

I turned on -rdynamic and -lineinfo when compiling. After several runs, I was able to see some meaningful stack at the moment of crash. In the end, the bug was caused by memory copying host data structure to only GPU0 while the troubled kernel is launched on GPU1.

Thank you all for the feedback and attention.