cudaMemcpy2D slow with TESLA1060 ?


I have a Quadro FX 570 and a Tesla 1060 on the same PC. My application is about 0.2s on Tesla and 5s on Quadro. Goood. But tansferring the data from CPU to GPU is much more slower with TESLA than with Quadro. It takes 4ms on the latter and … 60ms on the former !!! I used the 2.3 toolkit and compile for 1.3hardware (1.1 for Quadro).

What is wrong ?


can you use bandwidthTest in SDK example to calibrate bandwidth of Tesla and Quadro?

Yes, we have performed the test this morning. There is a QUADRO FX 570 on the same PC and it is x10 faster for transferring than TESLA!!! May be we have to exchange the two cards (TESLA is on slot 3 & 4) ?


We have got the solution : for some unknown reason the QUADRO was on PCI Express x16 and TESLA was on PCI Express x 1. The PC is a DELL Precision T3400.
We have exchanged the two cards and everything is fine now. The bandwidhTest sample in the SDK was really usefull to check the situation

