cudaMalloc3D (mistake in reference manual?)

I’ve copied and pasted the following out of the CUDA reference manual v3.0:

However, this is from the reference manual:

Thus either the reference manual or the programming guide is wrong. I would guess that the mistake is in the programming guide, (i.e. extent is the array size in bytes, not floats). If that’s the case then this would fix the mistake:

char * row = slice + y * pitch;

for (int x = 0; x < extent.width; x+=sizeof(float))


  float element = *(float *)(row + x);


Is this correct or am I missing something?

I’ve found that it’s the width field of the extent which needs to be in bytes:

cudaExtent extent = make_cudaExtent( 64*sizeof(T), 64, 64);

This has come up a few times on the forums (I’ve been one of them!), but I don’t think it’s been fixed yet in the manual. On the GPU, you then access elements via

__device__ T operator()( const unsigned int ix,

				 				 const unsigned int iy,

							 const unsigned int iz ) const {

  const char* data = reinterpret_cast<const char*>(this->pitchedPtr);

  // Rows are pitch apart

  const size_t pitch = this->dataPitch;

  // Slices are slicePitch apart

  const size_t slicePitch = pitch * this->dims.y;


  const char* slice = data + ( iz * slicePitch );

  const char* row = slice + ( iy * pitch );


  return( reinterpret_cast<const T*>(row)[ix] );


Where the [font=“Courier New”]dataPitch[/font] member is copied from the [font=“Courier New”]pitch[/font] field of the [font=“Courier New”]cudaPitchedPtr[/font] struct.