GPU performance is very poor

The system I am using is as follows

  • OS: Windows 11
  • Driver Version: 512.95
  • GeForce RTX 2080 Super with Max-Q Design

When I look at the performance in nbody with the following command, I see 600 - 800 GFLOPS/s, which is only 1/6 of the expected performance.

$ docker run --rm -it --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark -numbodies=128000

... (Omit unnecessary parts.)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Turing" with compute capability 7.5

> Compute 7.5 CUDA device: [NVIDIA GeForce RTX 2080 Super with Max-Q Design]
number of bodies = 128000
128000 bodies, total time for 10 iterations: 4913.921 ms
= 33.342 billion interactions per second
= 666.840 single-precision GFLOP/s at 20 flops per interaction   
  ^^^^^^^

The GPU utilization at that time is 100%.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.68.02    Driver Version: 512.95       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   57C    P5    39W /  N/A |    262MiB /  8192MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A         1      C   /nbody                          N/A      |
+-----------------------------------------------------------------------------+


When no problems are occurring, I get about 5000 GFLOPS/s when connected to the power supply and 2500 GFLOPS/s when running on battery power, but this is very rare.

Is it possible to always have maximum performance?

Here is a list of what I have tried

  • I tried performance measurement tools for games (FF14 benchmark tool). The score was about 1/5 of the score described in a blog post on the web. So the performance is poor not only on docker but also on windows.

  • I don’t know the exact values, but the speed of the ComputeShader on the software I am developing in Unity is also noticeably slower.

  • My current PC is a laptop RazerBlade, but in the past I have had similar problems with a Dell XPS.

  • I have set “Manage 3D settings” → “power management mode” to “prefer maximum performance” in Nvidia Settings, but performance is with this is not improve the performance.