Hello,
In case of Tegra, is it fast to use Unified Memory for the following cases?
case A
write to CPU memory. cudaMemcpyHostToDevice for ( several times ) { access to GPU memory. } cudaMemcpyDeviceToHost read from CPU memory.
case B
write to CPU memory. for ( several times ) { cudaMemcpyHostToDevice access to GPU memory. cudaMemcpyDeviceToHost access to CPU memory. } read from CPU memory.
I think case B could be faster when Unified Memory is used, but case A could not be faster.
Regards
Jin