It installed this branch: nvidia-384, the latest, yet when I launch the nbody I get :
% ./nbody
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= 1) to run in simulation)
-device=<d> (where d=0,1,2.... for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Error: only 0 Devices available, 1 requested. Exiting.
Windowed mode
Simulation data stored in video memory
Single precision floating point simulation
1 Devices used for simulation
GPU Device 0: “GeForce GTX 1060 6GB” with compute capability 6.1
Compute 6.1 CUDA device: [GeForce GTX 1060 6GB]
number of bodies = 4096
4096 bodies, total time for 10 iterations: 2.738 ms
= 61.272 billion interactions per second
= 1225.430 single-precision GFLOP/s at 20 flops per interaction
./nbody -numbodies=4096 -cpu -benchmark
OUTPUT:
Windowed mode
Simulation data stored in video memory
Single precision floating point simulation
1 Devices used for simulation
Simulation with CPU
number of bodies = 4096
4096 bodies, total time for 10 iterations: 7487.320 ms
= 0.022 billion interactions per second
= 0.448 single-precision GFLOP/s at 20 flops per interaction
It seems only 1 out of 12 cpu cores (threads) is used for this simulation.