Profiler bug? grid size, block size

Hi, is there a bug in the Visual Profiler 1.1 Beta?
If I take a look in the code of transpose.cu (SDK project), there is

dim3 grid(size_x / BLOCK_DIM, size_y / BLOCK_DIM, 1);
dim3 threads(BLOCK_DIM, BLOCK_DIM, 1);

where BLOCK_DIM=16, size_x=256, size_y=4096, therefore
grid = [16, 256, 1]
threads = [16, 16, 1]

However if I profile it using the profiler, I get:
grid size X = 256
grid size Y = 16

block size X = 16
block size Y = 16
block size Z = 16

It seems that the profiler swaps the grid dimensions. From my other experiments, it seems that the block size X and block size Y are OK, but the block size Z is filled with the value of Y.
Maybe somebody has already reported it, but I couldn’t find it anywhere. Do you expereience the same, or am I stuck in my own stupidity?