Assertion failure at ../nvGpuDebugServer.c Debuggin Problem

I have a Tesla C1060 on a 32-bit Fedora 10 with the latest drivers and CUDA 2.1, and the cuda-gdb installed.
I know that the cuda-gdb is guaranteed only for RHEL 5.x, but I wanted to see if it would run on my machine.

I took the bitreverse example from the cuda-gdb documentation. I compiled with “nvcc -g -G -o bitreverse” and it executed normally on the GPU and by using cuda-gdb for debugging on the GPU.

Then, I made a small change and I added a declaration of a shared variable as you can see in the code below. This code executed fine on the GPU, but when I use cuda-gdb for the same code I got the following error.

[pvouzis@eratosthenes Bitreverse]$ cuda-gdb bitreverse
(cuda-gdb) r
Starting program: /home/pvouzis/Documents/CUDA/CUDA_SDK2.1_linux32/myprojects/Bitreverse/bitreverse
[Thread debugging using libthread_db enabled]
[New process 19156]
[New Thread 1119952 (LWP 19156)]
Assertion failure at /home/buildmeister/build/sw/rel/gpu_drv/r180/r180_00/drivers/gpgpu/cuda/src/gpgpucomp/lib/gpuDebug/nvGpuDebugServer.c, line 1139: Variable array not defined by module

Any ideas about this error? Is it due to my OS or there is something else wrong with the debugger?

I believe you can reproduce this example easily on your machines. For example if somebody with another OS (RHEL ideally) could try that it would help me a lot.

I have been trying to make the debugger to work for a few days now, and I am completely fed up with it.

P.S. I tried to add the -G to the Makefile from the SDK to compile for the debugger but I was getting other linking errors (I don’t want to elaborate now). Has anybody managed to used/altered successfully the SDK Makefile for debugging with cuda-gdb? I am using my own Makefile now, but it would be nice to use the SDK Makefile.

[indent] #include <stdio.h>
#include <stdlib.h>

// Simple 8-bit bit reversal Compute test

#define N 256

global void bitreverse(unsigned int *data)

  extern __shared__ int array[];
  unsigned int *idata = data;

 unsigned int x = idata[threadIdx.x];

 x = ((0xf0f0f0f0 & x) >> 4) | ((0x0f0f0f0f & x) << 4);
 x = ((0xcccccccc & x) >> 2) | ((0x33333333 & x) << 2);
 x = ((0xaaaaaaaa & x) >> 1) | ((0x55555555 & x) << 1);

 idata[threadIdx.x] = x;


int main(void)
unsigned int *d = NULL; int i;
unsigned int idata[N], odata[N];
for (i = 0; i < N; i++)
idata[i] = (unsigned int)i;

 cudaMalloc((void**)&d, sizeof(int)*N);
 cudaMemcpy(d, idata, sizeof(int)*N,

 bitreverse<<<1, N, 10>>>(d);

cudaMemcpy(odata, d, sizeof(int)*N,

for (i = 0; i < N; i++)
printf("%u -> %u\n", idata[i], odata[i]);

return 0;


Thanks for reporting this. I’ve replicated an assertion and opened bug 529561.

This is a known bug, and we’re working on a fix for the next CUDA release.

Thanks for taking care of this, but is there a way around it? I really need debugging on the device because I am getting other “weird” results and I don’t know if it is another bug or my code. Debugging on the host doesn’t help because I don’t get identical results between simulation and release code.

Also, is there an idea of when you will release the next version of CUDA, or is it possible to get a version of CUDA when you fix this bug?


Yes, you can attempt to work around this by changing the following:

extern shared int array; —> shared int array[10];

Also, this means that the 3rd parameter to the kernel should be removed, as the shared memory allocation is no longer dynamic:

bitreverse<<<1,N,10>>>(d); —> bitreverse<<<1,N>>>(d);

For this particular test case, array is not used so there will be warnings upon compilation. However, this should sidestep the assert in cuda-gdb.

The next CUDA release will be in the Spring.