I am devoloping an application on Jetson Tk1. I have created an array using cudaMallocmanaged with cudaMemAttachHost flag. If I use that array in computation, It is faster compared if use an array allocated with normal malloc. I am wondering what could be the reason behavior. Please some one help me to understand this.