Just got a Tesla C1060 and dropped it into my linux x64 build and installed the latest (190.53) drivers. I ran the benchmarks that come with the cuda sdk and found performance to be far beneath what a Tesla is supposed to output. The nbody simulation gives 378.260 GFLOPS on 131072 bodies. It is my understanding that the Tesla has 933 ish GFLOPS for single precision. This is quite a disparity. nvidia-settings shows two performance levels for the tesla, 0 at 400MHz core NVClock and 300MHz memory, and 1 at 610MHz NV and 800MHz memory. My GTX275, on the other hand, has a third level (at which it always seems to operate) at 648MHz NV and 1188MHz memory. I’ve actually noticed significantly higher performance on the 275 for numbers of bodies that can fit in its memory (obviously the tesla wins with 131072 bodies . The Tesla does run pretty hot (82C on nbody with 131072), but I still feel like there’s quite a bit more performance to be squeezed out. Is there something I’m doing wrong that does not allow me to get top performance out of the card?