I have learned about cudaMemcpy is a “host to device” memory coping function.
So, I would like to know how to decrease it’s executing time? upgrade DRAM or Graph Card?
pin the host memory
if you have a gen2 PCIE link (or lower gen) switch to a card and system that has a gen3 link.