I want to allocate a H x W array on the device with unified memory. I will use cuMemcpy2D with it.
Unfortunately, if W is not a multiple of the required alignment (which would make a pitched array with no gaps), then the memcpy will complain (invalid parameter).
If I use cuMallocPitch, this works with the memcpy, but afaik, this would NOT be unified memory and I cannot use the CUdeviceptr value as a pointer on the host.
In order to use cuMallocManaged, I will need to calculate the pitch corresponding to W, then multiply this by H (times the element size). However, I need to know how the cuMallocPitch calculates the pitch.
On my Kepler, the pitch is the bytes in a row rounded up to a multiple of 512 bytes (determined by experiment). On other devices, this could be different. Will CU_DEVICE_ATTRIBUTE_TEXTURE_PITCH_ALIGNMENT always work?
My question is: How can I have an H x W array in unified memory if W is not properly aligned?
Also: Is linear device memory automatically managed by the UMS, or is there a way I can make it be so managed?
An enhancement request: Provide a cuMallocManagedPitch routine (and similarly for the Runtime API).
there is no pitched unified memory (UM)
textures have no connection to UM
I tried cuMemcpy2D to UM with an unaligned pitch and it was happy.
If for some reason I want to have a pitch, I can always allocate memory as [Height] [Pitch = Width + pad] and make the kernel aware of the pitch when accessing it.
Regarding textures… If I bind a texref to a device linear memory address, which happens to be in UM, that will be OK, right? I have to obey the constraints on the address and pitch alignments. That way, I can fill the memory using cuMemcpy2D from some other data on the host.
Answers to my two questions:
- It doesn't matter if W is not properly aligned. Except if it is going to be bound to a texture reference. Memcpy and memset will always work.
- No, linear (cuMemAlloc[Pitch]) device memory is not managed by UMS, but this is not necessary.
AFAIK UM cannot be the backing for texture