cuda-gdb can't access shared memory

Hello,

I’m having a problem with debugging a test kernel for shared memory access. Basically, all I do is create an NxN matrix, and have each thread write a number to the block’s shared memory and then copy it to global memory, for the host application to read. The program works, the only problem is that when I try to use cuda-gdb to see what happens (not because something wrong happens, but simply to experiment with cuda-gdb), I just can’t read the data put in the shared memory.

The relevant code excerpts follow (data_test points to global memory, where to write the results; Parameters is a struct which contains the size of the data_test matrix, allocated with cudaMallocPitch):

__global__ void kernel_test_shared(float* data_test, size_t pitch_test, Parameters alg_params)

{

        // Copy parameters to local variables;

        int height = alg_params.height;

        int width = alg_params.width;

        // Initialize share matrix

        extern __shared__ char shared[];

        float *test_shared = (float*) shared;

        // Compute target element coordinates

        int x = blockIdx.x*blockDim.x + threadIdx.x;

        int y = blockIdx.y*blockDim.y + threadIdx.y;

        // Check limits

        if(x > width || y > height)

        {

                return;

        }

        // Write to corresponding element in the shared memory

        test_shared[threadIdx.y*blockDim.x + threadIdx.x] = y*width + x;

        // Wait for all threads

        __syncthreads();

        // Copy to global memory

        *(((float*) (((char*) data_test) + y*pitch_test)) + x) = test_shared[threadIdx.y*blockDim.x + threadIdx.x];

}

(I used an awkward syntax to access the data_test array, but it was just to make sure I was writing everything correctly)

So, what happens here is that when I run cuda-gdb and try to access the shared or test_shared pointers, all I get is:

(cuda-gdb) p shared

$1 = 0x40 <Address 0x40 out of bounds>

(cuda-gdb) p test_shared

$2 = (@global float * @register) 0x40

/* step program until all threads write to shared memory */

(cuda-gdb) p test_shared[0]

$3 = 1.72208689e-22

(cuda-gdb) p test_shared[1]

$4 = 3.33029199

and the last ones seem to be random values (by the way, if I try to access address 0x40, I get a “Cannot access memory at address 0x40” error).

I’m trying to get the version numbers of the CUDA stuff I’ve got here… “nvcc --version” gives me 3.2… I really wouldn’t know how to get other things which might be useful.

“cuda-gdb --version” returns:

NVIDIA (R) CUDA Debugger

3.2 release

Portions Copyright (C) 2008-2010 NVIDIA Corporation

GNU gdb 6.6

Let me know if you might need more info.

Thanks everyone!

(P.S. I tried to use the forum’s search function, but typing “cuda-gdb shared” and even selecting “titles only” gave me hundreds of unrelated results.)

For searching the forums, it turns out to work much better to just use Google and add [font=“Courier New”]site:forums.nvidia.com[/font]. In this case it would have been better to search the manual though, where these matters are covered in chapter 8.

Thanks for the reply :-) I guess I should have checked the manual a bit more carefully. By the way, do you also have a quick answer about why “next” in cuda-gdb enters a function anyway (for example even “powf”)?

Thanks!

That happens because the functions get inlined.

Got it :-) Thanks!