I am wondering if the comments in cuda.h are still up to date for cuMemAllocPitch:
It mentions 4, 8 or 16 bytes for ElementSize.
The programming guide mentions alignment of 32, 64 and 128 bytes, maybe it’s a documentation mistake in guide (220.127.116.11) and should be bits ?
Or perhaps aligment is something else then these element sizes…
Also are large element sized supported ? Just wondering…
Also what would happen if elementsize is set to something weird like 3 or 40 ?!?
// // brief Allocates pitched device memory // // Allocates at least WidthInBytes * Height bytes of linear memory on // the device and returns in dptr a pointer to the allocated memory. The // function may pad the allocation to ensure that corresponding pointers in // any given row will continue to meet the alignment requirements for // coalescing as the address is updated from row to row. ElementSizeBytes // specifies the size of the largest reads and writes that will be performed // on the memory range. ElementSizeBytes may be 4, 8 or 16 (since coalesced // memory transactions are not possible on other data sizes). If // ElementSizeBytes is smaller than the actual read/write size of a kernel, // the kernel will run correctly, but possibly at reduced speed. The pitch // returned in pPitch by cuMemAllocPitch() is the width in bytes of the // allocation. The intended usage of pitch is as a separate parameter of the // allocation, used to compute addresses within the 2D array. Given the row // and column of an array element of type T, the address is computed as: