cuda-gdb conditional breakpoint causes program to stop making progress

Todd_Allen · October 18, 2016, 9:38pm

In this example, I use cuda-gdb to set a conditional breakpoint in a CUDA kernel. For simplicity’s sake, I made a condition that will never evaluate TRUE, but in general this could be some condition that just won’t evaluate true for a while (e.g. a condition on the last element in an input vector). The kernel stops making progress and appears to hang. This program, which normally would complete very quickly, never completes

Here’s an example program (just another vectorAdd):

#include <stdio.h>

__global__ void VecInc(float*  vec,
                       int     N)
{
   int i = blockDim.x * blockIdx.x + threadIdx.x;
   if (i < N) {
      vec[i] = vec[i] + 1.0;
   }
}

#define blockCount   256
#define numBlocks     20

int main(int argc, char** argv)
{
   float*  h_vec;
   cudaHostAlloc((void**)&h_vec, numBlocks * blockCount * sizeof(float),
                 cudaHostAllocPortable | cudaHostAllocWriteCombined);
   float*  d_vec;
   cudaMalloc((void**)&d_vec, numBlocks * blockCount * sizeof(float));
   cudaMemcpy(d_vec, h_vec, numBlocks * blockCount * sizeof(float),
              cudaMemcpyHostToDevice);

   VecInc<<<numBlocks, blockCount>>>(d_vec, numBlocks * blockCount);
   cudaError_t err = cudaGetLastError();
   if (err != cudaSuccess) {
      fprintf(stderr, "kernel launch failure: %s (%d)\n",
              cudaGetErrorString(err), err);
      exit(-1);
   }                                                             
   cudaThreadSynchronize();
   err = cudaGetLastError();
   if (err != cudaSuccess) {
      fprintf(stderr, "kernel failure: %s (%d)\n", cudaGetErrorString(err), err);
      exit(-1);
   }                                                             
   cudaMemcpy(h_vec, d_vec, numBlocks * blockCount * sizeof(float), cudaMemcpyDeviceToHost);
   cudaFree(d_vec);
   cudaFreeHost(h_vec);
   cudaThreadExit();
}

And its makefile:

NVCC = /usr/local/cuda/bin/nvcc
NVCCGEN = -gencode=arch=compute_53,code=\"sm_53,compute_53\"
NVCCFLAGS = $(NVCCGEN) --compiler-options -fno-strict-aliasing -DUNIX -g -G -rdc=true

COMPILE.cu = $(NVCC) $(NVCCFLAGS) $(TARGET_ARCH) -c
NVLINK = $(NVCC) $(NVCCGEN) -rdc=true

it: it.o
	$(NVLINK) $^ $(LD35LIBS) -o $@

it.o: it.cu
	$(COMPILE.cu) -c $<

clean:
	rm -f it it.o

Build this and then debug it with cuda-gdb:

/usr/local/cuda/bin/cuda-gdb ./it
NVIDIA (R) CUDA Debugger
8.0 release
Portions Copyright (C) 2007-2016 NVIDIA Corporation
GNU gdb (GDB) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-elf-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /mag1/todd/test/detach/it...done.
(cuda-gdb) set cuda software_preemption on
(cuda-gdb) b 8 if 0
Breakpoint 1 at 0x403824: file it.cu, line 8.
(cuda-gdb) r
Starting program: /mag1/todd/test/detach/./it 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0x20016081e0 (LWP 23328)]
[New Thread 0x20018081e0 (LWP 23329)]
[Thread 0x20018081e0 (LWP 23329) exited]

That is as far as it will get. Now, this vectorAdd only has 20*256 = 5120 elements. So even if it’s stopping for each warp, evaluating the condition, and then resuming, and this therefor may be slow, it shouldn’t take too terribly long.

I encountered a similar problem on Intel platforms and with a specific card back in CUDA 6.5. It disappeared after that. Perhaps this is related. Or perhaps it’s just a similar symptom. But in case it’s helpful, it was reported here:

https://partners.nvidia.com/bug/viewbug/1598128

I am no longer able to access that link, but maybe you still can internally.

Todd_Allen · October 18, 2016, 9:39pm

Oh, this worked in R24.1. It only started failing for me in R24.2.

kayccc · October 26, 2016, 4:53am

Hi Todd Allen,

This issue seems is based on software_preemption being enabled, which is untested to be working on mobile and therefore can be expected to fail oddly.
We’re investigating the proper solution for this problem, and will include the fix in coming release.
Once any clear schedule, I will post to you.

Thanks

Topic		Replies	Views
CUDA GDB and conditional breakpoints CUDA Programming and Performance	2	3061	May 14, 2010
cuda-gdb cannot break in device code CUDA Programming and Performance	2	1828	April 12, 2011
The cuda-gdb indefinitely hangs after outputting "[Switching focus to CUDA kernel 0, grid 1..." CUDA-GDB	6	1172	April 26, 2024
Cuda-gdb hangs, any method to know why? CUDA-GDB	7	1228	January 31, 2023
Can I use conditional breakpoint in cuda-gdb? CUDA-GDB	2	668	January 18, 2024
When using cuda-gdb for debugging in CLion, setting a breakpoint within a kernel function causes CLion to freeze CUDA-GDB	2	226	July 30, 2024
Cuda-gdb script execution CUDA-GDB	6	1859	January 29, 2024
cuda-gdb accessing non-resident UVM memory crashes cuda-gdb Jetson TX1	6	1561	October 18, 2021
Cuda-GDB doesn't hit breakpoints inside kernel/ if the kernel is in a library and then linked to the executable CUDA-GDB vscode , cuda-gdb	9	2382	April 13, 2023
cuda-gdb hangs in the CUDA 2.3 beta CUDA Programming and Performance	0	1114	June 30, 2009

cuda-gdb conditional breakpoint causes program to stop making progress

Related topics