Tesla C02050

Hi All

I run the particle system on Tesla device with 240 core and the frame rate was
higher than the Tesla device with 448 core. It should be higher on Tesla device
with 448 core because the it has more cores.

Could you please tell me why?

It’s hard to say without knowing more about your system (OS, driver etc.).

The particles SDK code is certainly much faster on Fermi - I get about 460 fps for 65K particles on a GTX 480 versus 175 fps on a GTX 280.

Shadi, since Simon is the person who wrote the particles code (am I right?), I’ll not dare add anything to what he said :)

Simon, does the nearest neighbor search in the particles code still benefit from the sort on the new Fermi architecture? (sorry Shadi to high-jack your post)

I also find this interesting since the only thing lower than the c10 series is a little lower clock in the cores. Everything else is superior. Should the OS and driver play such an important role? This is also interesting.

Yes, I wrote that code, for better or worse.

Re-ordering the particles into sorted order does still benefit Fermi, although doing the binning using atomics seems be slightly faster than using the radix sort on Fermi.

This code benefits from the new L1 cache too, if you configure the cache as 48KB and use global loads instead of texture fetches to read the positions and velocities, it’s about 10% faster.

Other people might throw FLOPs at the problem, but at least we’re innovating the GPU architecture!

I am still waiting to hear that the code was run in a Tesla S20 card and that there is no problem, if there is what is the problem?