While trying to reduce the number or registers used by a kernel that will at some point be part of a much larger one, ive experimented with the volatile keyword.

This is what the kernel looked like at first:

```
__global__ void convolve(float* result)
{
const int idx = (blockIdx.y*blockDim.x*gridDim.x)+blockIdx.x*blockDim.x+threadIdx.x;
const int z = idx/(DATA_W*DATA_H);
const int y = (idx - z * DATA_W * DATA_H) / DATA_W;
const int x = Â (idx - z * DATA_W * DATA_H - y * DATA_W);
if(x<DATA_W&&y<DATA_H&&z<DATA_D)
{
Â
Â float sum = 0;
Â for(int kz = -(KERNEL_D - KERNEL_Z - 1); kz <= KERNEL_Z; kz++)
Â {
ï¿½ Â for(int ky = -(KERNEL_H - KERNEL_Y - 1); ky <= KERNEL_Y; ky++)
ï¿½ Â {
ï¿½ Â Â for(int kx = -(KERNEL_W - KERNEL_X - 1); kx <= KERNEL_X; kx++)
ï¿½ Â Â {
ï¿½ Â ï¿½ Â int dx = (idx - z * DATA_W * DATA_H - y * DATA_W) + kx;
ï¿½ Â ï¿½ Â int dy = y + ky;
ï¿½ Â ï¿½ Â int dz = z + kz;
ï¿½ Â ï¿½ Â if(dx < 0) dx = 0;
ï¿½ Â ï¿½ Â if(dy < 0) dy = 0;
ï¿½ Â ï¿½ Â if(dz < 0) dz = 0;
ï¿½ Â ï¿½ Â if(dx >= DATA_W) dx = DATA_W - 1;
ï¿½ Â ï¿½ Â if(dy >= DATA_H) dy = DATA_H - 1;
ï¿½ Â ï¿½ Â if(dz >= DATA_D) dz = DATA_D - 1;
ï¿½ Â ï¿½ Â float filterVal = d_Kernel[((KERNEL_Z-kz)*KERNEL_H + (KERNEL_Y - ky)) * KERNEL_W + (KERNEL_X - kx)];
ï¿½ Â ï¿½ Â float pixelVal = tex3D(texData,dx ,dy,dz);
ï¿½ Â ï¿½ Â sum += filterVal * pixelVal;
ï¿½ Â Â }
ï¿½ Â }
Â }
Â result[(z*DATA_H+y) * DATA_W + x] = (float)sum;
}
}
```

Where the KERNEL_WHATEVER are const int’s in the header.

Ive put those values in registers, as far as i can tell anyway… and the register count has gone down?

From 18 to 12

```
__global__ void convolve(float* result)
{
const int idx = (blockIdx.y*blockDim.x*gridDim.x)+blockIdx.x*blockDim.x+threadIdx.x;
const int z = idx/(DATA_W*DATA_H);
const int y = (idx - z * DATA_W * DATA_H) / DATA_W;
const int x = Â (idx - z * DATA_W * DATA_H - y * DATA_W);
<b>volatile int kkd=KERNEL_D,kkh = KERNEL_H,kkz = KERNEL_Z,kky = KERNEL_Y, kkx = KERNEL_X, kkw = KERNEL_W;</b>
if(x<DATA_W&&y<DATA_H&&z<DATA_D)
{
Â
Â float sum = 0;
Â <b>for(int kz = -(kkd - kkz - 1); kz <= kkz; kz++)</b>
Â {
ï¿½ Â <b>for(int ky = -(kkh - kky - 1); ky <= kky; ky++)</b>
ï¿½ Â {
ï¿½ Â Â <b>for(int kx = -(kkw - kkx - 1); kx <= kkx; kx++)</b>
ï¿½ Â Â {
ï¿½ Â ï¿½ Â int dx = (idx - z * DATA_W * DATA_H - y * DATA_W) + kx;
ï¿½ Â ï¿½ Â int dy = y + ky;
ï¿½ Â ï¿½ Â int dz = z + kz;
ï¿½ Â ï¿½ Â if(dx < 0) dx = 0;
ï¿½ Â ï¿½ Â if(dy < 0) dy = 0;
ï¿½ Â ï¿½ Â if(dz < 0) dz = 0;
ï¿½ Â ï¿½ Â if(dx >= DATA_W) dx = DATA_W - 1;
ï¿½ Â ï¿½ Â if(dy >= DATA_H) dy = DATA_H - 1;
ï¿½ Â ï¿½ Â if(dz >= DATA_D) dz = DATA_D - 1;
ï¿½ Â ï¿½ Â <b>float filterVal = d_Kernel[((kkz-kz)*kkh+ (kky - ky)) * kkw + (kkx - kx)];</b>
ï¿½ Â ï¿½ Â float pixelVal = tex3D(texData,dx ,dy,dz);
ï¿½ Â ï¿½ Â sum += filterVal * pixelVal;
ï¿½ Â Â }
ï¿½ Â }
Â }
Â result[(z*DATA_H+y) * DATA_W + x] = (float)sum;
}
}
```

So… this is probably the worst question anyone can ask… but… what have i actually done?

Edit… well it seems i cant use bold inside code blocks. I guess you can still see which lines have been modified with the [B] so ill leave them there.