Assertion failure at ../nvGpuDebugServer.c Debuggin Problem

pvouzis · March 7, 2009, 1:09am

I have a Tesla C1060 on a 32-bit Fedora 10 with the latest drivers and CUDA 2.1, and the cuda-gdb installed.
I know that the cuda-gdb is guaranteed only for RHEL 5.x, but I wanted to see if it would run on my machine.

I took the bitreverse example from the cuda-gdb documentation. I compiled with “nvcc -g -G bitreverse.cu -o bitreverse” and it executed normally on the GPU and by using cuda-gdb for debugging on the GPU.

Then, I made a small change and I added a declaration of a shared variable as you can see in the code below. This code executed fine on the GPU, but when I use cuda-gdb for the same code I got the following error.

[indent]
[pvouzis@eratosthenes Bitreverse]$ cuda-gdb bitreverse
NVIDIA ® CUDA Debugger
Beta release
Portions Copyright © 2008 NVIDIA Corporation
GNU gdb 6.6
Copyright © 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB. Type “show warranty” for details.
This GDB was configured as “i686-pc-linux-gnu”…
Using host libthread_db library “/lib/libthread_db.so.1”.
(cuda-gdb) r
Starting program: /home/pvouzis/Documents/CUDA/CUDA_SDK2.1_linux32/myprojects/Bitreverse/bitreverse
[Thread debugging using libthread_db enabled]
[New process 19156]
[New Thread 1119952 (LWP 19156)]
Assertion failure at /home/buildmeister/build/sw/rel/gpu_drv/r180/r180_00/drivers/gpgpu/cuda/src/gpgpucomp/lib/gpuDebug/nvGpuDebugServer.c, line 1139: Variable array not defined by module
Aborted[/indent]

Any ideas about this error? Is it due to my OS or there is something else wrong with the debugger?

I believe you can reproduce this example easily on your machines. For example if somebody with another OS (RHEL ideally) could try that it would help me a lot.

I have been trying to make the debugger to work for a few days now, and I am completely fed up with it.

P.S. I tried to add the -G to the Makefile from the SDK to compile for the debugger but I was getting other linking errors (I don’t want to elaborate now). Has anybody managed to used/altered successfully the SDK Makefile for debugging with cuda-gdb? I am using my own Makefile now, but it would be nice to use the SDK Makefile.

[indent] #include <stdio.h>
#include <stdlib.h>

// Simple 8-bit bit reversal Compute test

#define N 256

global void bitreverse(unsigned int *data)
{

  extern __shared__ int array[];
  unsigned int *idata = data;

 unsigned int x = idata[threadIdx.x];

 x = ((0xf0f0f0f0 & x) >> 4) | ((0x0f0f0f0f & x) << 4);
 x = ((0xcccccccc & x) >> 2) | ((0x33333333 & x) << 2);
 x = ((0xaaaaaaaa & x) >> 1) | ((0x55555555 & x) << 1);

 idata[threadIdx.x] = x;

}

int main(void)
{
unsigned int *d = NULL; int i;
unsigned int idata[N], odata[N];
for (i = 0; i < N; i++)
idata[i] = (unsigned int)i;

 cudaMalloc((void**)&d, sizeof(int)*N);
 cudaMemcpy(d, idata, sizeof(int)*N,
            cudaMemcpyHostToDevice);

 bitreverse<<<1, N, 10>>>(d);

cudaMemcpy(odata, d, sizeof(int)*N,
cudaMemcpyHostToDevice);

for (i = 0; i < N; i++)
printf(“%u → %u\n”, idata[i], odata[i]);

cudaFree((void*)d);
return 0;

}[/indent]

netllama · March 8, 2009, 9:34pm

Thanks for reporting this. I’ve replicated an assertion and opened bug 529561.

netllama · March 9, 2009, 9:58pm

This is a known bug, and we’re working on a fix for the next CUDA release.

pvouzis · March 10, 2009, 1:39pm

Thanks for taking care of this, but is there a way around it? I really need debugging on the device because I am getting other “weird” results and I don’t know if it is another bug or my code. Debugging on the host doesn’t help because I don’t get identical results between simulation and release code.

Also, is there an idea of when you will release the next version of CUDA, or is it possible to get a version of CUDA when you fix this bug?

Thanks.

netllama · March 11, 2009, 6:17pm

Yes, you can attempt to work around this by changing the following:

extern shared int array; —> shared int array[10];

Also, this means that the 3rd parameter to the kernel should be removed, as the shared memory allocation is no longer dynamic:

bitreverse<<<1,N,10>>>(d); —> bitreverse<<<1,N>>>(d);

For this particular test case, array is not used so there will be warnings upon compilation. However, this should sidestep the assert in cuda-gdb.

The next CUDA release will be in the Spring.

Topic		Replies	Views
Cuda-GDB bug when assertion fails CUDA-GDB	2	181	October 14, 2024
Assertion failure in cuda-gdb 2.1 CUDA Programming and Performance	4	5045	March 7, 2009
Assertion failure in cuda-gdb printing a variable gives 'cuda-gdb internal error' CUDA Programming and Performance	1	1741	September 29, 2009
cuda-gdb hang and compiled program spewing nonsense CUDA Programming and Performance	7	2248	February 15, 2011
cuda-gdb segfaults on setting break, linux64, 3.0b CUDA Programming and Performance	6	9195	March 2, 2010
Cuda-gdb aborted CUDA-GDB	7	141	November 24, 2024
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	53721	August 1, 2011
cuda-gdb error CUDA Setup and Installation	15	3132	September 12, 2019
Getting into a CUDA Subprogram why is the debuuger stepping over cuds subprograms? CUDA Programming and Performance	4	7067	November 20, 2009
CUDA-GDB captured "Illegal access to address" exception when I invoke child kernel, but the result is correct when free run CUDA Programming and Performance	6	1717	March 20, 2017

Assertion failure at ../nvGpuDebugServer.c Debuggin Problem

Related topics