How to decrease cudaMemcpy time

I have learned about cudaMemcpy is a “host to device” memory coping function.
So, I would like to know how to decrease it’s executing time? upgrade DRAM or Graph Card?

pin the host memory

if you have a gen2 PCIE link (or lower gen) switch to a card and system that has a gen3 link.