CUDA unify memory allows to allocate single buffer for CPU/GPU with cudaMallocManaged. Does it exist a single way
to be sure the memory on the CPU will be aligned on a specify boundary (16, 32, 64 [Byte])
when I will use it on CPU (like posix_memalign).
You do not need to worry about alignment:
“The allocated memory is suitably aligned for any kind of variable.”
This guarantees the returned pointer is at least aligned on a multiple of 16 bytes (uint4 etc), or higher if a future CUDA release introduces a larger type.
ok right, I will do without, in short, on the CPU part I have some SIMD #pragma that suppose the memory is aligned on the CPU register architecture.
#pragma vector aligned // OpenMP 4
a[i] = b[i]; // imagine a and b have been allocated with cudaMallocManaged, this code will potentially crash
If in the futur CUDA staff could implement this feature (specify the alignement for the CPU size) by an optional argument in cudaMallocManaged, it will be nice.