From sample NVIDIA_CUDA-8.0_Samples/0_Simple/simpleZeroCopy/simpleZeroCopy.cu.
There are two ways to access memory between CPU and GPU space.
When I tried to cudaHostRegister to map an existed memory alloced from CPU to GPU space.
I was failed. seemed this api is not supported on TX1 with CUDA8.0.
In my case I need to map some existed memory achieve from other place like mmap or malloc.
I’d like to do some accelerated computing on GPU. So cudaMallocHost cannot meet my requirements.
Do I have any other possible way to do zero copy from cpu to gpu space like cudaHostRegister does?
The Io latency is my main delay in my calculations.