GPU memory dump with cuda-gdb python

Hello,

I have a script that I use for debugging OpenCV images. https://github.com/dtmoodie/gdb-imagewatch
This works correctly for host side memory but I’d like to extend it to device memory if possible.

On the host side, I can call gdb.inferiors()[0].read_memory(data_address, size) to dump process memory using the gdb python interface. However when I do this for device memory, I get all zeros for output.
Is this just not implemented in cuda-gdb or is there something else I need to call?

Hi, dtmoodie

I have raised a questin to related cuda-gdb dev. Let’s wait their responsing.

Best Regards
veraj

Hi, dtmoodie

Sorry for the late response.

Here is the workaround.

Following workaround allows reading device memory from python. For example, using an application, and breaking before the kernel launch (line 52), 0x10207600000 is the address of the array on the device:

(cuda-gdb) l
47
48 HANDLE_CUDA_ERROR(cudaMalloc((void**)&d, sizeof(int)N));
49 HANDLE_CUDA_ERROR(cudaEventCreate(&asyncWaitEvent));
50 HANDLE_CUDA_ERROR(cudaMemcpy(d, idata, sizeof(int)N, cudaMemcpyHostToDevice));
51
52 bitreverse<<<1, N, N
sizeof(int)>>>(d);
53 HANDLE_CUDA_ERROR(cudaGetLastError());
54 HANDLE_CUDA_ERROR(cudaEventRecord(asyncWaitEvent, 0));
55
56 /
Spin on the host while kernel is running */
(cuda-gdb) p d
$3 = (void *) 0x10207600000
(cuda-gdb) python

def cuda_read_global_memory(addr): return gdb.parse_and_eval("*(@global unsigned int *)%s" % (addr));
print(cuda_read_global_memory(0x10207600000))
print(cuda_read_global_memory(0x10207600004))
print(cuda_read_global_memory(0x10207600008))
0
1
2

BTW - @local, @shared, @global, and @generic all also work. For more information on cuda address specifiers, please look here:
http://docs.nvidia.com/cuda/cuda-gdb/#inspecting-program-state