Reproduceable Bug in reported GPU-Utilization

I found out how to reproduce and trigger the GPU-Utilization 0% bug.

System: Ubuntu 16.04
Driver: 367.27
GPU: GTX 1080

First, see the following nvidia-smi output.
My GPU is idle and the GPU-Utilization shows 0%.
So far so good.

root@ht:~# nvidia-smi
Sat Jun 18 14:52:08 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.27                 Driver Version: 367.27                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:01:00.0      On |                  N/A |
| 40%   53C    P0    35W / 180W |     10MiB /  8112MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+

Now on some other terminal I’ll run some compute intense task.
This task causes the GPU to run at 100%.
It’s expected the GPU-Utiliazion to show 100%.
As you can see, it’s doing that, which is exactly how it should be.

root@ht:~# nvidia-smi
Sat Jun 18 14:57:31 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.27                 Driver Version: 367.27                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:01:00.0      On |                  N/A |
| 27%   62C    P2   189W / 217W |   1027MiB /  8112MiB |    100%   E. Process |
+-------------------------------+----------------------+----------------------+

Now how to trigger the bug. I’ll change the following line in my xorg.conf:

Original:

Option         "RegistryDwords" "PerfLevelSrc=0x2222"

New:

Option         "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelSrc=0x2222"

So all I do here is to enable PowerMizer.
After the change in xorg.conf I have to restart both X11 and my compute intense.
The bug now occurs.
The GPU-Utilization will be shown as 0% even if it running at 100% in real:

root@ht:~# nvidia-smi           
Sat Jun 18 15:06:30 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.27                 Driver Version: 367.27                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:01:00.0      On |                  N/A |
| 69%   62C    P2   196W / 217W |   1027MiB /  8112MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+

As a proof that my compute intense task is running see reported power consumption.
Note that I was unable to trigger the bug by setting PowerMizer via nvidia-settings.
Using nvidia-settings I can enable PowerMizer without triggering the bug.
It needs to be done via xorg.conf.


atom

Hi atomu, I think its not recommended to use these RegistryDwords. As soon as issue hit please provide nvidia bug report by running nvidia-bug-report.sh script as super/root user. What compute intense application you are running? Can you share app or this issue can be reproduce with any other apps too?

Hey sandip, thanks for your reply. As requested I’ve recreated the case and when the bug hits in created the nvidia-bug-report.log.gz using nvidia-bug-report.sh and send to linux-bugs@nvidia.com.

The application I’ve used is my own software, it’s open-source software. You can clone the repository from GitHub: https://github.com/hashcat/oclHashcat and recreate the case locally. But I don’t think you need to use my software, it will make no difference which software you use.

Ticket number 200220143 for reference.