Is cudaMallocmanaged with cudaMemAttachHost flag is faster than malloc?

I am devoloping an application on Jetson Tk1. I have created an array using cudaMallocmanaged with cudaMemAttachHost flag. If I use that array in computation, It is faster compared if use an array allocated with normal malloc. I am wondering what could be the reason behavior. Please some one help me to understand this.


Hi sivaramakrishna,

Not sure what’s your use case, but since the cudaMallocmanaged is page-locked, it’s as opposed to regular pageable host memory allocated by malloc(), to use page-locked host memory has several benefits, you could refer to