If you have a 3D for loop to translate into a kernel
for ( i = 1; i < 5 ; i++){
for ( j = 1; j < 5 ; j++){
for ( k = 1; k < 5 ; k++){
a[i][j][k]= b[i][j][k]+c[i][j][k);
}}}
Define your kernel parameters as follows:
dim3 grid (5);
dim3 block (5,5);
kernel<<< grid,block >>>();
And your kernel
kernel()
{
ix = blockIdx.x + 1;
iy = threadIdx.y + 1;
iz = threadIdx.x + 1;
a[ix][iy][iz]=b[ix][iy][iz] +c[ix][iy][iz];
}