I’ve copied and pasted the following out of the CUDA reference manual v3.0:
However, this is from the reference manual:
Thus either the reference manual or the programming guide is wrong. I would guess that the mistake is in the programming guide, (i.e. extent is the array size in bytes, not floats). If that’s the case then this would fix the mistake:
char * row = slice + y * pitch;
for (int x = 0; x < extent.width; x+=sizeof(float))
{
float element = *(float *)(row + x);
}
This has come up a few times on the forums (I’ve been one of them!), but I don’t think it’s been fixed yet in the manual. On the GPU, you then access elements via
__device__ T operator()( const unsigned int ix,
const unsigned int iy,
const unsigned int iz ) const {
const char* data = reinterpret_cast<const char*>(this->pitchedPtr);
// Rows are pitch apart
const size_t pitch = this->dataPitch;
// Slices are slicePitch apart
const size_t slicePitch = pitch * this->dims.y;
const char* slice = data + ( iz * slicePitch );
const char* row = slice + ( iy * pitch );
return( reinterpret_cast<const T*>(row)[ix] );
}
Where the [font=“Courier New”]dataPitch[/font] member is copied from the [font=“Courier New”]pitch[/font] field of the [font=“Courier New”]cudaPitchedPtr[/font] struct.