I got a matlab (2011) program and a compiled cuda program to compare on ‘Tesla M2050’ and ‘Tesla K20Xm’.
The K20Xm much slower.
On both the ECC is enabled.
I tried to compile the cuda code with –arch=sm_35 but I got the error: ‘Error using handleKernelArgs’
How I can check it? which parameters I should check in order to find the cause of this slowness?
(I new in this area)