Nvidia-SMI reporting 0% gpu utilization

I’m having an issue where the output of nvidia-smi doesn’t seem to be matching the amount of work that is being done on the machine. I am running on the same software on two different servers, but with on one server I’m getting the following results, with a very odd 0% GPU-Util amount, and also very low power usage:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro RTX 6000                Off | 00000000:3B:00.0 Off |                    0 |
| N/A   50C    P0              72W / 250W |   5799MiB / 23040MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Quadro RTX 6000                Off | 00000000:AF:00.0 Off |                    0 |
| N/A   61C    P0              82W / 250W |   1792MiB / 23040MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Quadro RTX 6000                Off | 00000000:D8:00.0 Off |                    0 |
| N/A   30C    P0              50W / 250W |      0MiB / 23040MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   2993017      C   audiovisualizer                            1072MiB |
|    0   N/A  N/A   2993018      C   audiovisualizer                            1246MiB |
|    0   N/A  N/A   2993019      C   audiovisualizer                             974MiB |
|    0   N/A  N/A   2993020      C   audiovisualizer                            1216MiB |
|    0   N/A  N/A   2993021      C   audiovisualizer                            1240MiB |
|    1   N/A  N/A   2993022      C   audiovisualizer                            1658MiB |
+---------------------------------------------------------------------------------------+

On the other server, running the same software (but different driver version), I’m getting this for the output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA Quadro R...  On   | 00000000:3B:00.0 Off |                    0 |
| N/A   48C    P0   110W / 250W |   5714MiB / 22698MiB |     98%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA Quadro R...  On   | 00000000:AF:00.0 Off |                    0 |
| N/A   63C    P0   176W / 250W |   1361MiB / 22698MiB |     92%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA Quadro R...  On   | 00000000:D8:00.0 Off |                    0 |
| N/A   26C    P8    13W / 250W |      0MiB / 22698MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    647652      C   audiovisualizer                  1348MiB |
|    0   N/A  N/A    648903      C   audiovisualizer                   982MiB |
|    0   N/A  N/A    648908      C   audiovisualizer                  1294MiB |
|    0   N/A  N/A    648911      C   audiovisualizer                  1070MiB |
|    0   N/A  N/A    648917      C   audiovisualizer                  1070MiB |
|    1   N/A  N/A   1348854      C   audiovisualizer                  1216MiB |
+-----------------------------------------------------------------------------+

Which looks far more correct for the anticipated load on the GPU’s. I’m also only using the first 2 GPU’s on the servers, so the 0% utilization for the 3rd GPU is correct. On the first server though, even though I’m not using that GPU, it’s always stuck at 4% utilization, and the first two GPU’s which I am using, are stuck at 0% utilization.

The output from both servers is correct, i.e., the software on the server with the 0% GPU-utilization is working correctly.

Any ideas on why the incorrect output from nvidia-smi?

Update: I have upgraded the second server that was working correctly from Cuda 11.3 to 12.2, and it too is now reporting the same 0% GPU-utilization, mirroring the first server’s output.

One more update (and solution): I recompiled my binaries using nvcc and the --gpu-architecture=native flag, and now GPU utilization is being reported correctly on the second server that I had just updated to Cuda 12.2. Previously I was not specifying a GPU architecture to NVCC, but rather using the default compilation option. I tried recompiling on the original server with the updated nvcc flags that was the first to be upgraded to Cuda 12.2, but that server is still reporting 0% GPU utilization.