Zero-copy: is cudaSetDeviceFlags(cudaDeviceMapHost) actually needed?

I’m trying Zero-Copy with some kernels on an Orin. It’s working fine, and there’s no performance hit unless I try using an atomic function on a mapped-pinned host buffer. However, on a whim, I commented out the call to cudaSetDeviceFlags(cudaDeviceMapHost). My unit tests still work just fine. Is that call actually needed these days? (Currently using CUDA 11.4.4. I don’t know when we’ll upgrade to 12.x.)

I ask because I don’t relish the thought of trying to find the main() of all our unit tests and applications (of which we have around 75) just to add code to set that device flag in an appropriate location.

It’s not needed. the flag is automatically set on 64-bit linux OS. See here.

Unified addressing is automatically enabled in 64-bit processes .

All host memory allocated through all devices using cudaMallocHost() and cudaHostAlloc() is always directly accessible from all devices that support unified addressing. This is the case regardless of whether or not the flags cudaHostAllocPortable and cudaHostAllocMapped are specified.