Depends on application for sure. But lets just say the problem mainly involves some matrix addition, multiplication, division, interpolation, and the problem has quite a few if statements. i7 is running C code, and tesla is obviously running cuda. The programmer, which is me, is not completely and utterly dumb and hopeless to write some OK code.
How faster can cuda (tesla 1060) be?
By the way for each matrix in the problem usually it has 60000 elements.
That is impossible to answer. There are two theoretical peaks limits - a memory bandwidth limit (which corresponds to a speed up of about 8x), and a compute speed limit, which corresponds to a factor of about 10x for single precision precision arithmetic and about 2x for double precision. But those are completely theoretical limits, and achieving peak performance on either device isn’t easy. The GPU also has some fixed function hardware for textures and filtering that can provide a lot of speed up at almost zero cost. The CPU has very fast vector units that can also provide enormous speed up, but most people don’t know how to make their code (or their compilers) generate instructions that will exploit them.
If you want to see how difficult and arbitrary the question you are asking is - take a few minutes and read this thread.
I will say that 60000 element matrices are rather small, and you might struggle to achieve good performance on the Telsa, just because your problem doesn’t contain enough computation to fully exploit the hardware.
Good answer. I find it particularly interesting that Intels silicon design is far and away more advanced than GPU processors, but the power lies in the ability to parallelise (or otherwise utilise the specialized nature of the GPU) the process/algorithm.