Hello,
I’m having a problem with debugging a test kernel for shared memory access. Basically, all I do is create an NxN matrix, and have each thread write a number to the block’s shared memory and then copy it to global memory, for the host application to read. The program works, the only problem is that when I try to use cuda-gdb to see what happens (not because something wrong happens, but simply to experiment with cuda-gdb), I just can’t read the data put in the shared memory.
The relevant code excerpts follow (data_test points to global memory, where to write the results; Parameters is a struct which contains the size of the data_test matrix, allocated with cudaMallocPitch):
__global__ void kernel_test_shared(float* data_test, size_t pitch_test, Parameters alg_params)
{
// Copy parameters to local variables;
int height = alg_params.height;
int width = alg_params.width;
// Initialize share matrix
extern __shared__ char shared[];
float *test_shared = (float*) shared;
// Compute target element coordinates
int x = blockIdx.x*blockDim.x + threadIdx.x;
int y = blockIdx.y*blockDim.y + threadIdx.y;
// Check limits
if(x > width || y > height)
{
return;
}
// Write to corresponding element in the shared memory
test_shared[threadIdx.y*blockDim.x + threadIdx.x] = y*width + x;
// Wait for all threads
__syncthreads();
// Copy to global memory
*(((float*) (((char*) data_test) + y*pitch_test)) + x) = test_shared[threadIdx.y*blockDim.x + threadIdx.x];
}
(I used an awkward syntax to access the data_test array, but it was just to make sure I was writing everything correctly)
So, what happens here is that when I run cuda-gdb and try to access the shared or test_shared pointers, all I get is:
(cuda-gdb) p shared
$1 = 0x40 <Address 0x40 out of bounds>
(cuda-gdb) p test_shared
$2 = (@global float * @register) 0x40
/* step program until all threads write to shared memory */
(cuda-gdb) p test_shared[0]
$3 = 1.72208689e-22
(cuda-gdb) p test_shared[1]
$4 = 3.33029199
and the last ones seem to be random values (by the way, if I try to access address 0x40, I get a “Cannot access memory at address 0x40” error).
I’m trying to get the version numbers of the CUDA stuff I’ve got here… “nvcc --version” gives me 3.2… I really wouldn’t know how to get other things which might be useful.
“cuda-gdb --version” returns:
NVIDIA (R) CUDA Debugger
3.2 release
Portions Copyright (C) 2008-2010 NVIDIA Corporation
GNU gdb 6.6
Let me know if you might need more info.
Thanks everyone!