cudaMemcpy2D(): reason for pitch limits?

cudaMemcpy2D() fails with a pitch size greater than 2^18 = 262144.

Can anyone tell me the reason behind this seemingly arbitrary limit? As far as I understood, having a pitch for a 2D array just means making sure the rows are the right size so that alignment is the same for every row and you still get coalesced memory access. There is no obvious reason why there should be a size limit.

(I just ran up against this limit, which is barely documented. The number isn’t in the manual, but there is a brief reference to there being a limit in cudaGetDeviceProperties. This is going to cost me a fair bit of rewriting.)


cudaMemcpy2D() also fails with a height greater than 2^16 = 65536.

Again – why? And if there’s no good reason, is the limit going to go away?

I believe these are hardware limitations - there are only a limited number of bits in the hardware copy engines.

We’ll try and make this clearer in the documentation.

Thanks for the quick reply.

I guess this wasn’t an unreasonable limitation in the graphics world. I imagine all sorts of things like this are popping up now that general purpose computing is starting to take off.