I found the pitch given by cudaMallocPitch()
is a mutiple of 512 for my M2090 with CUDA 5.0.
Can I access the gap mem between (width, pitch]
and tail mem ((height - 1) * pitch + width, height * pitch] ?
i.e. is the following code safe and allowed?
// init
const char* buf="hello world!";
char* pMem;
size_t pitch;
// alloc dev mem
cudaMallocPitch(&pMem, &pitch, 1, 2);
// write gap
cudaMemcpy(&pMem[1], buf, 12, Host2Device);
// write tail
cudaMemcpy(&pMem[pitch + 1], buf, 12, Host2Device);