Tesla c2050 vs Tesla T10 Processor: which is normally faster?

Hello

I’m running some cuda codes on some relatively big graphs/sparse matrices (40,000 - 800,000 nodes) and I’m getting better performance with the Tesla T10. I was wondering if that is normal as I was expecting the opposite since the c2050 has more cores. Any reason why this could be happening?

The Tesla c2050 is in a computer with 8 cores of Intel i7 CPU while the Tesla T10 is in an S1070 Tesla system but I’m using only one of the Tesla T10 GPUs (not done anything to use more).

Thanks

Did you recompile and tune code for Fermi?

Did you recompile and tune code for Fermi?

I recompiled the code but didn’t tune it apart from specifying that it should use more L1 cache than shared memory. Is there any specific way to tune it?

I forgot to mention that I have many threads (10,000 - 150,000) depending on the size of the graph and I’m mostly using global memory and registers, hardly any shared memory.

I recompiled the code but didn’t tune it apart from specifying that it should use more L1 cache than shared memory. Is there any specific way to tune it?

I forgot to mention that I have many threads (10,000 - 150,000) depending on the size of the graph and I’m mostly using global memory and registers, hardly any shared memory.