Dereferencing pitched memory

I have a cuda array using cudaMallocPitch. The array is 2d with the following struct as its type:

typedef struct min_max_t {
    unsigned char min;
    unsigned char max;
} min_max_t

The array is allocated like so:

min_max_t *d_min_max_matrix;
size_t pitch;
cudaMallocPitch(&d_min_max_matrix, &pitch, 8 *sizeof(min_max_t), 8);
for(int y = 0; y < 8; ++y) {
    void *d_row = (char *) d_min_max_matrix + y * pitch;
    min_max_t *h_row = min_max_matrix[y]; //where min_max_matrix is a min_max_t** that is allocated on the host

    cudaMemcpy(d_row, h_row, 8 *sizeof(min_max_t), cudaMemcpyHostToDevice);

Now I understand that getting the desired row is only possible with using the pitch value:
min_max_t *d_min_max_matrix_row = (min_max_t *) ( (char *) d_min_max_matrix + y * pitch);

But I don’t understand how to get the value in the x direction that I need other than using array dereference.

This works:

But I can’t get this to work:
min_max_t *d_min_max = d_min_max_matrix_row + x * sizeof(min_max_t);
I’m not sure how you are supposed to dereference it.

When you do pointer arithmetic, the pointer arithmetic is automatically scaled by the size of the thing the pointer points to. This is a c/c++ programming concept, and not unique or specific to CUDA.

Therefore the correct construct to mimic this:


is this:

min_max_t *d_min_max = d_min_max_matrix_row + x;

with those constructs, you should observe that:

*d_min_max ==  d_min_max_matrix_row[x]

Could you explain why the (char *) is necessary in finding the row then? It seems odd to cast to another pointer and then back to my original pointer, but that’s what I saw everyone else doing in tutorials and forums. I think that threw me off and made me think I had to do more than just d_min_max_matrix_row[x].

because the pitch is specified in bytes. It is not specified in terms of the number of elements. Therefore to use a byte offset to go from one row to the next, it’s necessary that the row pointer first be recast to a byte type pointer, otherwise the pointer arithmetic would not give the desired result.