Compute shader: imageStore seem to transpose every 4x4 pixel group.

Using a very simple shader with a 16x16 rga32f image as output:

#version 450
#extension GL_ARB_shading_language_420pack : enable
#extension GL_ARB_compute_shader : enable

layout(binding = 1, rgba32f) uniform image2D outputs;

layout (local_size_x = 16, local_size_y = 16) in;
void main() {
  ivec2 coords = ivec2(gl_LocalInvocationID.xy);
  imageStore(outputs, coords, vec4(coords, gl_LocalInvocationIndex, 0.99));
}

After running the shader, dumping the values of the image gives the following:
0 0 0 0.99
0 1 16 0.99
0 2 32 0.99
0 3 48 0.99
4 0 4 0.99
4 1 20 0.99
4 2 36 0.99
4 3 52 0.99
8 0 8 0.99
8 1 24 0.99
8 2 40 0.99
8 3 56 0.99
12 0 12 0.99
12 1 28 0.99

The image target of the compute shader has optimal tiling, which is copied from to a linear tiling image, that is then read from the host to get the above result. I’m reading the VkSubresourceLayout to get the correct row pitch etc, but this doesn’t really explain why the values are transposed like above. Is there something I’m missing here, like how the pixels are laid out in a linear tiling image?

Thanks,
Johan