hi,when i used nbody test the performance of Tesla P4 device,the command is liky this:
./nbody -benchmark -numbodies=256000
And i get the result is like this:
Simulation data stored in video memory
Single precision floating point simulation
1 Devices used for simulation
GPU Device 0: “Tesla P4” with compute capability 6.1
Compute 6.1 CUDA device: [Tesla P4]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 3456.762 ms
= 189.588 billion interactions per second
= 3791.757 single-precision GFLOP/s at 20 flops per interaction
but in the datasheet of P4, the single-precision should be 5500 GFLOP/s.
where is the problem? and what should do to get the 5500 result.
best regards for you.