Hi,
I am facing the following problem on a GeForce GTX 580 (Fermi-class) GPU.
Just to give you some background, I am reading single-byte samples packed in the following manner in a file: Real(Signal 1), Imaginary(Signal 1), Real(Signal 2), Imaginary(Signal 2). (Each byte is a signed char, taking values between, -128 and 127.) I read these into a char4 array, use a custom function to copy them to two float2 arrays corresponding to each signal. (This is just an isolated part of a larger program.)
When I run the program using cuda-memcheck, I get either:
or
or
The thread and block indices where the invalid writes happen are random.
The code is attached, and the main kernel is reproduced below. The strange thing is that this code works (and cuda-memcheck throws no error) on a non-Fermi-class GPU that I have access to. Is it that the kernel needs to be rewritten in some way for Fermi-class GPUs, or could it be that the specific GPU that I am working on is broken? Another point to note is that the Fermi gives no error for N <= 8192, so I am more inclined towards the latter possibility.
Here is the kernel:
__global__ void CopyDataForFFT(char4 *pc4Data,
float2 *pf2FFTInX,
float2 *pf2FFTInY)
{
int i = (blockIdx.x * blockDim.x) + threadIdx.x;
pf2FFTInX[i].x = (float) pc4Data[i].x;
pf2FFTInX[i].y = (float) pc4Data[i].y;
pf2FFTInY[i].x = (float) pc4Data[i].z;
pf2FFTInY[i].y = (float) pc4Data[i].w;
return;
}
I use CUDA 4.0 (also tried CUDA 3.2) on RHEL 5.6. Details of the GPU that might be relevant:
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 1535 MBytes
(16) Multiprocessors x (32) CUDA Cores/MP: 512 CUDA Cores
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Any help will be greatly appreciated! Thank you!
testfft.cu (3.78 KB)