If I replace this with this slower implementation (modulo instead of bit mask) with the 280.13 drivers, the kernel does not fail
xm4 = k2DZSizeIn * ( x % 8);
xm3 = k2DZSizeIn * ((x + 1) % 8);
xm2 = k2DZSizeIn * ((x + 2) % 8);
xm1 = k2DZSizeIn * ((x + 3) % 8);
x0 = k2DZSizeIn * ((x + 4) % 8);
xp1 = k2DZSizeIn * ((x + 5) % 8);
xp2 = k2DZSizeIn * ((x + 6) % 8);
xp3 = k2DZSizeIn * ((x + 7) % 8);
The failure in the first case first occurs for xm1 with x = 5, (8 & 7) does not return 0 and leads to a the code addressing unallocated memory. I have no easy way of knowing what the return value is.