I tried to do a simple matrix vector multiply with cublasSgemv which was working. Then I tried to use cudaMallocPitch to alocate the memory on the device to prevent problems with coalescing. The pitch returned was 64. I presume this means that the 2nd row starts at byte location 64.
No I want to use the cublasSgemv routine to multiply with my vector, but I can not give it a pitch. The only thing I can pass is an lda, but this refers to the position where the next column starts.
Am I combining two worlds that should not be combined, or is there an easy solution for this?
I figured out what I had misunderstood:
The matrix for cublas is stored column after column. This means that it is padded on the row side to the pitch number. I thought it was like reading book with a row as a line, then go on to the next line, with array stored line after line. Now it works including the pitch.