For cudaMallocPitch(), the element of an address is computed as

T* pElement = (T*)((char*)BaseAddress + Row * pitch) + Column;

Does the indexes Row and Column run [0, width-1] and [0, height-1], respectively, or [1, width], and [1, height] respectively. Thanks!

This is C - all element indexing starts at 0.

If I want to use

T* pElement = (T*)((char*)BaseAddress + Column* pitch) + Row;

Do I write?

cudaMallocPitch( (void**) &d_src , &pitch , height* sizeof(float) , width);