Assertion failure at ../nvGpuDebugServer.c Debuggin Problem

I have a Tesla C1060 on a 32-bit Fedora 10 with the latest drivers and CUDA 2.1, and the cuda-gdb installed.
I know that the cuda-gdb is guaranteed only for RHEL 5.x, but I wanted to see if it would run on my machine.

I took the bitreverse example from the cuda-gdb documentation. I compiled with “nvcc -g -G bitreverse.cu -o bitreverse” and it executed normally on the GPU and by using cuda-gdb for debugging on the GPU.

Then, I made a small change and I added a declaration of a shared variable as you can see in the code below. This code executed fine on the GPU, but when I use cuda-gdb for the same code I got the following error.

[indent]
[pvouzis@eratosthenes Bitreverse]$ cuda-gdb bitreverse
NVIDIA ® CUDA Debugger
Beta release
Portions Copyright © 2008 NVIDIA Corporation
GNU gdb 6.6
Copyright © 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB. Type “show warranty” for details.
This GDB was configured as “i686-pc-linux-gnu”…
Using host libthread_db library “/lib/libthread_db.so.1”.
(cuda-gdb) r
Starting program: /home/pvouzis/Documents/CUDA/CUDA_SDK2.1_linux32/myprojects/Bitreverse/bitreverse
[Thread debugging using libthread_db enabled]
[New process 19156]
[New Thread 1119952 (LWP 19156)]
Assertion failure at /home/buildmeister/build/sw/rel/gpu_drv/r180/r180_00/drivers/gpgpu/cuda/src/gpgpucomp/lib/gpuDebug/nvGpuDebugServer.c, line 1139: Variable array not defined by module
Aborted[/indent]

Any ideas about this error? Is it due to my OS or there is something else wrong with the debugger?

I believe you can reproduce this example easily on your machines. For example if somebody with another OS (RHEL ideally) could try that it would help me a lot.

I have been trying to make the debugger to work for a few days now, and I am completely fed up with it.

P.S. I tried to add the -G to the Makefile from the SDK to compile for the debugger but I was getting other linking errors (I don’t want to elaborate now). Has anybody managed to used/altered successfully the SDK Makefile for debugging with cuda-gdb? I am using my own Makefile now, but it would be nice to use the SDK Makefile.

[indent] #include <stdio.h>
#include <stdlib.h>

// Simple 8-bit bit reversal Compute test

#define N 256

global void bitreverse(unsigned int *data)
{

  extern __shared__ int array[];
  unsigned int *idata = data;

 unsigned int x = idata[threadIdx.x];

 x = ((0xf0f0f0f0 & x) >> 4) | ((0x0f0f0f0f & x) << 4);
 x = ((0xcccccccc & x) >> 2) | ((0x33333333 & x) << 2);
 x = ((0xaaaaaaaa & x) >> 1) | ((0x55555555 & x) << 1);

 idata[threadIdx.x] = x;

}

int main(void)
{
unsigned int *d = NULL; int i;
unsigned int idata[N], odata[N];
for (i = 0; i < N; i++)
idata[i] = (unsigned int)i;

 cudaMalloc((void**)&d, sizeof(int)*N);
 cudaMemcpy(d, idata, sizeof(int)*N,
            cudaMemcpyHostToDevice);

 bitreverse<<<1, N, 10>>>(d);

cudaMemcpy(odata, d, sizeof(int)*N,
cudaMemcpyHostToDevice);

for (i = 0; i < N; i++)
printf("%u -> %u\n", idata[i], odata[i]);

cudaFree((void*)d);
return 0;

}[/indent]

Thanks for reporting this. I’ve replicated an assertion and opened bug 529561.

This is a known bug, and we’re working on a fix for the next CUDA release.

Thanks for taking care of this, but is there a way around it? I really need debugging on the device because I am getting other “weird” results and I don’t know if it is another bug or my code. Debugging on the host doesn’t help because I don’t get identical results between simulation and release code.

Also, is there an idea of when you will release the next version of CUDA, or is it possible to get a version of CUDA when you fix this bug?

Thanks.

Yes, you can attempt to work around this by changing the following:

extern shared int array; —> shared int array[10];

Also, this means that the 3rd parameter to the kernel should be removed, as the shared memory allocation is no longer dynamic:

bitreverse<<<1,N,10>>>(d); —> bitreverse<<<1,N>>>(d);

For this particular test case, array is not used so there will be warnings upon compilation. However, this should sidestep the assert in cuda-gdb.

The next CUDA release will be in the Spring.