 # Index calculation problem! GridSize or/and BlockSize?!?!

Hello developers,

I am stuck on a index calculation problem and I am not able to see the source of the problem. The code works fine for blockSize = [16,16,1] or [32,16,1] or [16,16,2]

but for example it doesn’t work for blockSize = [8,8,8] or [16,4,4] or [16,8,4] etc. If somebody can help me it would be perfect!

The program calculates the FFT of a 3D volume with a ramp filter. The following code runs a complex multiplication in the frequency domain. The fftDimension is 513x484x16 and the kernelDimension is 513x1:

int tBlock_x = 8;
int tBlock_y = 8;
int tBlock_z = 8;
int blocksInX = (fftDimension.x - 1) / tBlock_x + 1;
int blocksInY = (fftDimension.y - 1) / tBlock_y + 1;
int blocksInZ = (fftDimension.z - 1) / tBlock_z + 1;

// Cuda capability < 2.0 -> Grid Dimension = 2
dim3 dimGrid = dim3(blocksInX, blocksInY*blocksInZ);
dim3 dimBlock = dim3(tBlock_x, tBlock_y, tBlock_z);

multiply_kernel <<< dimGrid, dimBlock >>> ( deviceProjectionFFT,
fftDimension,
deviceKernelFFT,
blocksInY,
1.0f/(float)blocksInY );

******* K E R N E L *******

global
void
multiply_kernel(cufftComplex *projFFT, int3 fftDimension, cufftComplex *kernelFFT, unsigned int Blocks_Y, float invBlocks_Y)
{
unsigned int blockIdx_z = __float2uint_rd(blockIdx.y * invBlocks_Y);
unsigned int blockIdx_y = blockIdx.y - __umul24(blockIdx_z, Blocks_Y);
unsigned int i = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;
unsigned int j = __umul24(blockIdx_y, blockDim.y) + threadIdx.y;
unsigned int k = __umul24(blockIdx_z, blockDim.z) + threadIdx.z;

if (i >= fftDimension.x || j >= fftDimension.y || k >= fftDimension.z)
return;

long int proj_idx = i + (j + k * fftDimension.y ) * fftDimension.x;

cufftComplex result;
result.x = projFFT[proj_idx].x * kernelFFT[i].x - projFFT[proj_idx].y * kernelFFT[i].y;
result.y = projFFT[proj_idx].y * kernelFFT[i].x + projFFT[proj_idx].x * kernelFFT[i].y;
projFFT[proj_idx] = result;
}

Thank you very much! ;-)

Does it work if you replace your fragile construct for approximate integer division

``````__float2uint_rd(blockIdx.y * invBlocks_Y)
``````

by a real integer div?

tera, thank very much!

``````unsigned int blockIdx_z = int((blockIdx.y +0.5f) * invBlocks_Y);