Performance problem of memcpy in Tesla


I have a performance problem of Device to Host memcpy in Tesla.
It takes 5 sec to copy 1GB. It’s too slow.
The same code works fast in other GPU. This problem only happens with a Tesla.

Is it likely because of lack of electric power?
But strange thing is the only slow part is D2H memcpy. Other operations such as H2D memcpy and numerical computation work well.
Can this symptom be caused by lack of electric power?

I am facing the same problem. The speed almost halves when transferring from device to host. Can some1 help out?

D2H lower(~50%) than H2D is normal as we have seen that in our system too.

I don’t know about OP’s problem though.

With pageable memory, I presume? On my box, DtH is about 10% slower with pinned memory, but about 70% slower to pageable.

With pinned memory, I got 5.7 GB/s H2D, and 3.3 GB/s D2H when NUMA cpu and gpu are correctly paired.

When they are not correctly pair, I got 5.4 GB/s H2D and 2.6 GB/s D2H.

That sounds like a dual Intel Tylersburg machine. The problems with those has been well documented elsewhere.

We tested again with the same GPU, another mother board and another power unit which supplies enough power.
As a result, a slow-down has not been observed. Accordingly we concluded the problem is because of insufficient power supply.

I did not know that if power supply is not enough, a GPU goes slow instead of hanging up.

Thank you all for comments.

I doubt that your conclusion is correct.