mmap() and cudaMemCopy()

I know mmap() can directly map device memory into user space, therefore data can be copied from disk to device memory directly, with the help of DMA…

So if the above statement is true, why CUDA needs to copy data from disk to host memory, then from host memory to device memory? I it can implement the mmap() and copy data from disk to device memory directly.

Is there any design concerns regarding to this cudaMemCopy() approach?