cudaMalloc3D() returns erroneous pitchedPtr in toolkit 4.0

I ran into some difficulties while trying to copy a 3D volume of integer values from device to host. While debugging the code I noticed a weird value was attributed to the xsize field of the cudaPitchedPtr struct that cudaMalloc3D returns.

int width = 434;

int height = 383;

int depth = 20;

cudaPitchedPtr pitchedPointer;

cudaExtent extent = make_cudaExtent(width*sizeof(int),height, depth);

cudaMalloc3D(&pitchedPointer, extent);

The values returned in pitchedPointer are:

pitchedPointer.pitch → 2048

seems ok; a row is alligned to be of 512 elemens in width, multiplied by sizeof(int) which is 4 => 2048

pitchedPointer.ysize → 383 also ok

pitchedPointer.xsize → 1736

not ok; according to the Reference Manual (v 4.0) pg. 50:

However, it seems that xsize is returned in bytes 1736 = sizeof(int)*434.

Basically when I try a cudaMemcpy3D from device to host if i do not manually modify the pitchedPointer.xsize to be 434 the values I get are completely messed up.

Anybody else noticed this problem? Or am I making some wrong judgement? Because, if not, I will file this as a bug report… :)