Can someone confirm that zerocopy is the best memory access technique used on Xavier-CUDA
For ingesting data to/from CUDA address space, cudaHostAlloc/cudaHostGetDevicePtr can provide zero copy.
For intermediate processing buffers that only need to reside on GPU, cudaMalloc can provide best texture/caching performance.
You can determine if your application can find CUDA unified memory (cudaMallocManaged) useful.