P40 GPU programer can use cudaMalloc request Device Memory, malloc to Host Memory, cudaMemcpy to transfer betewwn them by PCIE, whether there have new API to support for this shared memory architecture
- cudaMalloc for allocating device memory
- malloc for allocating host memory
- cudaMemcpy for transferring
Since TX2 has shared physical memory, it’s recommended to use pinned/unified memory to avoid unnecessary memory copy.