The system I am using is as follows
- OS: Windows 11
- Driver Version: 512.95
- GeForce RTX 2080 Super with Max-Q Design
When I look at the performance in nbody with the following command, I see 600 - 800 GFLOPS/s, which is only 1/6 of the expected performance.
$ docker run --rm -it --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark -numbodies=128000
... (Omit unnecessary parts.)
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Turing" with compute capability 7.5
> Compute 7.5 CUDA device: [NVIDIA GeForce RTX 2080 Super with Max-Q Design]
number of bodies = 128000
128000 bodies, total time for 10 iterations: 4913.921 ms
= 33.342 billion interactions per second
= 666.840 single-precision GFLOP/s at 20 flops per interaction
^^^^^^^
The GPU utilization at that time is 100%.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.68.02 Driver Version: 512.95 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| N/A 57C P5 39W / N/A | 262MiB / 8192MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1 C /nbody N/A |
+-----------------------------------------------------------------------------+
When no problems are occurring, I get about 5000 GFLOPS/s when connected to the power supply and 2500 GFLOPS/s when running on battery power, but this is very rare.
Is it possible to always have maximum performance?
Here is a list of what I have tried
-
I tried performance measurement tools for games (FF14 benchmark tool). The score was about 1/5 of the score described in a blog post on the web. So the performance is poor not only on docker but also on windows.
-
I don’t know the exact values, but the speed of the ComputeShader on the software I am developing in Unity is also noticeably slower.
-
My current PC is a laptop RazerBlade, but in the past I have had similar problems with a Dell XPS.
-
I have set “Manage 3D settings” → “power management mode” to “prefer maximum performance” in Nvidia Settings, but performance is with this is not improve the performance.