Titan X low performance in nbody sample code

I’m now using titan X and CUDA 5.5 for cuda programming.
However, I’ve faced a problem which is low performance.

when I run sample ‘nbody’ via CUDA Sample Browser ‘run’ botton, it seems to be good performance.
But, when I executed nbody code via visual 2010 manually, the performance was really bad.

I think that Sample Browser’s Performance is not for my GPU…

More information the performance is below.

  1. via CUDA Sample Brower
    228 fps, 133 BIPS, 2800 GFLOP/s

  2. via visual 2010 manually
    1.9 fps, 1.2 BIPS, 23.5 GFLOP/s

Also, CUDA7.0 has only manual running, so I met the same problem with CUDA5.5.
really low performance…

Does anybody have an idea?!!
How can I fix it to get high performance?

Make sure you are compiling for a release build, not a debug build. In particular, the nvcc command line should not include the switch -G.

I have not run the nbody sample app in a very long time, but as I recall the app has a command line switch that you need to specify to turn on benchmarking mode (which is disabled by default), otherwise performance suffers noticeably due to CUDA /OpenGL interop overhead.

Oh, yes, I was just compiling for a release build.

So, the point that you mentioned is I have to turn on benchmarking mode for ‘Titan X’.
That means, Do I have to change code parts of nbody example?

The app has a number of settings. number of particles, single/double precision, benchmark mode, etc.

Unless you have all these matched, you should not expect the same performance.

All of these settings are command-line switches, they don’t require code changes.

you can get a list of all the command line options by adding the --help command line switch


Thank you for your reply!

You mean I have to match setting values for titan x.
Is there any reference for setting by NVIDIA?

No, I mean that if the CUDA sample browser gives you this:

  1. via CUDA Sample Brower
    228 fps, 133 BIPS, 2800 GFLOP/s

then you’ll need to figure out what settings gave you that, and then you can feed those settings to the program when you run it “manually”


Oh, I got it what you mean!
I will try it right now and really thank you!

Hi joy4162 - just to get an idea about your Titan X vs. my original Titans, can you run the full-speed nbody sample with -numbodies=32768, e.g.

…\win64\Release>nbody.exe -numbodies=32768

Running this with the OpenGL GUI under CUDA 6.5, my Gk110 Titans report between 1600 - 1800 GFLOPS.