I’ve been having problems debugging several of my cuda kernel calls using cuda-gdb, as in the following code sample, numThreads and threadID don’t seem to have the correct assigned values across different threads.
[codebox]
global void ChopScale(cufftComplex * d_RawData,
const unsigned int sample_cnt)
{
const int numThreads = blockDim.x * gridDim.x;
const int threadID = blockIdx.x * blockDim.x + threadIdx.x;
for (unsigned int i=threadID; i<sample_cnt; i+=numThreads)
{
d_RawData[i].x = d_RawData[i].x * 2;
d_RawData[i].y = d_RawData[i].y * 2;
}
__syncthreads(); // wait for all threads
}
[/codebox]
I launch the kernel with in my main program with a simple setup as follows, and the d_RawData is a 1D vector with more than 1024 elements.
[codebox]#define FFTWIDTH 1024
…
dim3 mygridDim, myblkDim;
mygridDim = dim3(1);
myblkDim = dim3(128);
…
ChopScale<<<mygridDim, myblkDim>>>((cufftComplex *)d_RawData, FFTWIDTH);[/codebox]
in thread <<<(0,0),(0,0,0)>>> they seem fine:
[codebox]
(cuda-gdb) n
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
ChopScale () at cudamemtest.cu:106
106 d_RawData[i].y = d_RawData[i].y * 2;
(cuda-gdb) i locals
i = 128
numThreads = 128
threadID = 0
d_RawData = (cufftComplex * const @global) 0xb66d3008
sample_cnt = 1024
[/codebox]
but if I switch to a different thread
codebox thread <<<(0,0),(95,0,0)>>>
Switching to <<<(0,0),(95,0,0)>>> ChopScale () at cudamemtest.cu:106
[Current CUDA Thread <<<(0,0),(95,0,0)>>>]
ChopScale () at cudamemtest.cu:106
106 d_RawData[i].y = d_RawData[i].y * 2;
(cuda-gdb) i locals
i = 255
numThreads = 128
threadID = 127
d_RawData = (cufftComplex * const @global) 0xb66d3008
sample_cnt = 1024[/codebox]
the threadID gives me a wrong value of 127, while it should have been 95, and this sometimes happens to the numThreads var too.
Another thing that confuses me is, when I tried print out the address of the threadID var in both threads and they all give me the same mem address, is it what it supposed to be like that?
Has anyone had the similar/same problems? Or any suggestions/solution to this? Thank you so much.
The configurations for my computer is
MacBook Unibody (Late 2008) with GeForce 9400M
Ubuntu 9.04 32bit with kernel 2.6.28-11-generic
gcc/g++ version 4.3.3
cuda tookit 2.3_linux_32_ubuntu9.04
cuda driver 2.3_linux_32_190.18
Merry Christmas and Happy New Year =)