What is the stream-ordered equivalent of cudaMallocPitch?

I sometimes see questions like “why is there no managed pitched allocator?” or “how do I handle a pitched allocation in thrust?” I think if you lump all this together, my own personal conjecture is that it must be that the CUDA API developers don’t think pitched allocations are as useful/valuable/important as they once used to be.

From a technical perspective, I can certainly see (due to the lack of the cache structure that was in later GPUs) that pitched allocations should have been noticeably important in cc1.x GPUs. Those GPUs died out circa 2016. From my own personal perspective, I have an opinion that the effort associated with pitched allocations is no longer worth it in the cases I have come across, for the presumed benefits that accrue. You may have a different view, and you can express your view if you wish by filing a bug, suggesting the improvement to the cuda runtime API that you would like to see.

There’s probably some connection with textures as well. I myself would prefer in most cases to use the mechanisms available to me first/primarily that don’t involve textures, before using textures as a last resource, to try and improve performance. Again, just my own opinions and conjecture, but I sometimes wonder if these ideas are thought about in terms of where to invest effort in API development.

1 Like