I have a Quadro FX 570 and a Tesla 1060 on the same PC. My application is about 0.2s on Tesla and 5s on Quadro. Goood. But tansferring the data from CPU to GPU is much more slower with TESLA than with Quadro. It takes 4ms on the latter and … 60ms on the former !!! I used the 2.3 toolkit and compile for 1.3hardware (1.1 for Quadro).
What is wrong ?