Hi all!

I have recently “finished” my project(thesis) in CUDA. When I say “finished” I am saying that I transformed an 1h30m code to 10-12 minutes and I have obtained the expected output(a breast mamography). However my image has a non-wanted grid that matches the iterations I am doing: The image is 352x896 and my dimGrid and dimBlock are, respectively, (11,7) and (8,8) so I am reconstructing image blocks of 88x56 at a time , which accounts for 4 X-axis iterations and 16 Y-axis iterations, which is the size of the grid that appears in the image.

I have lost count at how many times I have reviewed my code and I am not expecting anyone to waste their time as I know I am probably the only one who can correct this but I would like some advice. And I will post some parts of my code which I think are critical to this error

The iterations for the image blocks(88x56)

```
for(int i=0;i<352;i=i+88)
for(int j=0;j<(896);j=j+56){
iteration(i, j, object, scale, xbinsize,ybinsize,zbinsize,detectorXDim,detectorYDim,NSlices,xfocus,yfocus,zfocus,matprojections);
```

The only program that uses the variables i and j(there are three similar to this one, as it corresponds to the intersections of the ray(x-ray source to bin detector) with X,Y and Z axis)

```
__global__ void sysmaty(float*intersectionsx,float*intersectionsy,float*intersectionsz,int*biny,float xfocus,float yfocus, float zfocus, float xbinsize,float ybinsize, int zbinsize,int detectorXDim,int detectorYDim, int NSlices, int iiterationsu,int jiterationsu)
{
int tx=threadIdx.x, ty=threadIdx.y,bx=blockIdx.x, by=blockIdx.y;
float x,y,z,t;
int idy=(ty+by*BLOCKSIZE)+jiterationsu;
int idx=(tx+bx*BLOCKSIZE)+iiterationsu;
float slopeVectorx=xfocus-((float)idx+0.5)*xbinsize;
float slopeVectory=yfocus-((float)idy+0.5)*ybinsize;
float slopeVectorz=zfocus+17.0f;
//__syncthreads();
if(idy*ybinsize<yfocus){
int yint=idy+1;
for(yint=yint; yint<=yfocus/(float)SCALE && yint<=detectorYDim;yint++){
y=(float)((float)yint*ybinsize);
t=(float)((y-((float)idy+0.5)*ybinsize)/slopeVectory);
x=(float)(((float)idx+0.5)*xbinsize+t*slopeVectorx);
z=-17.0f+t*slopeVectorz;//supostamente onde ta zero, sera uma coordenada de z
intersectionsx[(((idx-iiterationsu)+88*(idy-jiterationsu))*(detectorYDim)+(yint-1))]=x;
intersectionsy[(((idx-iiterationsu)+88*(idy-jiterationsu))*(detectorYDim)+(yint-1))]=y;
intersectionsz[(((idx-iiterationsu)+88*(idy-jiterationsu))*(detectorYDim)+(yint-1))]=z;
biny[(((idx-iiterationsu)+88*(idy-jiterationsu))*(detectorYDim)+(yint-1))]=idx+(detectorXDim)*idy;
}
}
else if(idy*ybinsize>yfocus){
int yint=idy-1;
int counter=idy+1;
for(yint=yint,counter=counter; yint>=yfocus/(float)SCALE && yint>=0;yint--, counter++){
y=(float)((float)yint*ybinsize);
t=(float)((y-((float)idy+0.5)*ybinsize)/slopeVectory);
x=(float)(((float)idx+0.5)*xbinsize+t*slopeVectorx);
z=-17.0f+t*slopeVectorz;
intersectionsx[(((idx-iiterationsu)+88*(idy-jiterationsu))*detectorYDim+(counter-1))]=x;
intersectionsy[(((idx-iiterationsu)+88*(idy-jiterationsu))*detectorYDim+(counter-1))]=y;
intersectionsz[(((idx-iiterationsu)+88*(idy-jiterationsu))*detectorYDim+(counter-1))]=z;
biny[(((idx-iiterationsu)+88*(idy-jiterationsu))*(detectorYDim)+(counter-1))]=idx+(detectorXDim)*idy;
}
}
}
```

The initialization of the arrays used in the previous function

```
float *intersectionsYx_h=(float*)malloc(dimGrid.x*dimGrid.y*dimBlock.x*dimBlock.y*detectorYDim*sizeof(float));
float *intersectionsYy_h=(float*)malloc(dimGrid.x*dimGrid.y*dimBlock.x*dimBlock.y*detectorYDim*sizeof(float));
float *intersectionsYz_h=(float*)malloc(dimGrid.x*dimGrid.y*dimBlock.x*dimBlock.y*detectorYDim*sizeof(float));
int *binY_h=(int*)malloc(dimGrid.x*dimGrid.y*dimBlock.x*dimBlock.y*detectorYDim*sizeof(int));
```

The output image(the image in reality is 704x896x60) but that is just a slice of the 60 Slices, and it is just the half of the detector that contains the breast. The other half is obtained by simmetry in one of the reconstruction steps.

Uploaded with ImageShack.us

Thank you in advance