I tried page locking a buffer using VirtualLock, but there doesn’t seem to be any improvement at all in data transfer speeds from CPU to GPU.
However, memory allocated by cudaMallocHost is able to transfer to the GPU at about 2X the speed of “regular” memory! Why is this happening? (Possibly alignment issues?)
Do you guys have ideas on this?