Hi,
I am trying to implement MRI gridding using CUDA. The algorithm basically tries to interpolate the data on the spiral to the cartesian grid. I have designed the algorithm as an input driven assignment, meaning the thread ids are assigned for the input spiral data. Now the input spiral data is a 1 X N data of complex numbers. Now the grid is a K X K which is much lesser in length than the input data. In my case the input data is of length 1 X 12712 and the grid is 64 X 64. I then call it back to MATLAB using the mexx function. Here are some snippets of the code :
Assignment of the grid dimensions on the device :
dim3 dimBlock(256);
dim3 dimGrid((1len)/dimBlock.x);
if((1len)%256!=0)dimGrid.x+=1;
here len = 12712.
calling the device:
interpolation<<<dimGrid,dimBlock>>>(devicein_r,devicein_i,K,N,J,kx1,ky1,deviceout_r,deviceou
t_i,fn,Ofactor,len,dcf);
// retruning the data from the device after interpolation:
global void interpolation(float *devicein_r, float *devicein_i, int K,int N,
int J,float *kx1,float *ky1,float *deviceout_r,float *deviceout_i,float *fn,int Ofactor, int len,float dcf)
{
int idx = blockIdx.xblockDim.x + threadIdx.x;
//initializing the device output to zero.
if(idx< K*K)
{
*(deviceout_r + idx) = 0.0;
*(deviceout_i + idx) = 0.0;
}
for (x=minx;x<=maxx;x++)
{
..... (interpoation code)
interp = interpx*interpy;
temp_cord = (xtemp+1)+ (ytemp*K) - 1;
*(deviceout_r+temp_cord)+= *(devicein_r+idx)*interp*(*(dcf+idx));
*(deviceout_i+temp_cord)+= *(devicein_i+idx)*interp*(*(dcf+idx));
}
}
Now when i try to run this i find that when the data set increases beyond 5000 the values all turn up to be zero even though the thread ids are supposed to run till the len. I am not sure if the problem is with the thread assignment or the in how the device out array is called. I am having serious problems with this.
Also I know that the max threads that can be called is 12288 for a Gforce 8800. But in the above case it should work at least till those many thread assignments rite. Also how do i handle those that go beyond that. I mean will a simple for loop without using idx work within the device? Please help.
Regards
Kashyap