I have been experimenting with CUDA to see if it would be useful in a project, however I have ran into an in pass, where my global functions is simply not being called in one of my programs. I was hoping that someone here would know what would stop a global functions from being called and or how to diagnose such a problem.
details:
I am ruing on a windows vista laptop with the 2.0 beta version of CUDA. I manniged to get most of the sample programs to work.
the code that has the problem is:
complexptr w;
w.realptr=wr_dev;
w.imagptr=wi_dev;//these variables are CUDA
//-----------------------cheak for errors-----------------
//-----------------------calling the kernals---------------------
dim3 threadsize(block_size,block_size,block_size);
dim3 dimGrid ( (wsize0/(threadsize.x)) + ((!(wsize0/(threadsize.x)))?0:1) , (wsize1/(threadsize.y)) + ((!(wsize1/(threadsize.y)))?0:1),(wsize2/(threadsize.z)) + ((!(wsize2/(threadsize.z)))?0:1) );
dim3 pass2gird (1,1,1);
dim3 pass2threadsize(dimGrid.x*dimGrid.y*dimGrid.z/2,1,1);
//to stroe the outpout of the 1st pass
complex* outbfer;
CUDA_SAFE_CALL_NO_SYNC(cudaMalloc((void**) &outbfer,dimGrid.x*dimGrid.y*dimGrid.z*sizeof(complex)));
complex* finaloutput;
CUDA_SAFE_CALL_NO_SYNC(cudaMalloc((void**) &finaloutput,1*sizeof(complex)))
...
sincreduce_3d<<<dimGrid,threadsize,threadsize.x*threadsize.y*threadsize.z*sizeof(complex)>>>(outbfer,w, R, Bx, By, Bz, wsize0,wsize1,wsize2,dim3(rX[i],rY[j],rZ[k]) );
by global function sincreduce_3d, and structure complex and complexptr are defined:
struct complex
{
float real;
float imag;
};
struct complexptr
{
float *realptr;
float *imagptr;
};
//NOTE: this was taken from the NVIDA file reduction_kernel.cu this MUST be docmuntead
//WARNING
//reduces an input complexptr, to an outpout complex pointer, one for each box
__global__ void sincreduce_3d(complex* out,complexptr w, const float R,const float Bx,const float By,const float Bz, const int nx,const int ny,const int nz,dim3 pointanted)
{
extern __shared__ complex buffers[];
however when I step thrught the code on VC++ 2005 express edition the global function is not called, and the output variables are not changed. This is in contrast to the other programs I have written in cuda where the debugger has worked.