Code address:https://github.com/zhuzhuoyue/cuda_benchmarks
use Unified Memory:(simpleManaged.cu)
./simpleManaged 400000000
host: MallocManaged: 1.082769
host: init arrays: 3.432402
device: uvm+compute+synchronize: 0.013866
host: access all arrays: 6.175977
host: access all arrays a second time: 0.570206
host: free: 0.382470
total: 11.658073
without using Unified Memory:(simpleMemcpy.cu)
./simpleMemcpy 400000000
host: MallocHost: 1.311571
host: init arrays: 3.348044
device: malloc+copy+compute: 1.734390
host: access all arrays: 2.175081
host: access all arrays a second time: 0.552091
host: free: 0.416059
total: 9.537628