Only K40c is being utilized for computation out of two GPUs. Other one is K5200.

Hi,

In my Dell Precision Tower 5810 machine I have installed two graphics card - Tesla K40c and Quadro K5200. When I try to perform computation, K5200 is never used. Computation always goes to K40c. Anyone have any idea what is going on?
Here is some nvidia-smi log

$ nvidia-smi
Mon Oct 19 13:44:26 2015
+------------------------------------------------------+
| NVIDIA-SMI 352.41     Driver Version: 352.41         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K5200        Off  | 0000:03:00.0     Off |                    0 |
| 28%   46C    P8    21W / 150W |     16MiB /  7678MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K40c          Off  | 0000:04:00.0     Off |                    0 |
| 25%   51C    P0   138W / 235W |   1482MiB / 11519MiB |     96%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    1      2088    C   .../mhasan/_caffe/.build_release/tools/caffe  1456MiB |
+-----------------------------------------------------------------------------+
$ nvidia-smi  -q -d CLOCK

==============NVSMI LOG==============

Timestamp                           : Mon Oct 19 13:41:17 2015
Driver Version                      : 352.41

Attached GPUs                       : 2
GPU 0000:03:00.0
    Clocks
        Graphics                    : 875 MHz
        SM                          : 875 MHz
        Memory                      : 3004 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 875 MHz
        SM                          : 875 MHz
        Memory                      : 3004 MHz
    SM Clock Samples
        Duration                    : 3.18 sec
        Number of Samples           : 4
        Max                         : 875 MHz
        Min                         : 324 MHz
        Avg                         : 832 MHz
    Memory Clock Samples
        Duration                    : 3.18 sec
        Number of Samples           : 4
        Max                         : 3004 MHz
        Min                         : 324 MHz
        Avg                         : 3004 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A

GPU 0000:04:00.0
    Clocks
        Graphics                    : 745 MHz
        SM                          : 745 MHz
        Memory                      : 3004 MHz
    Applications Clocks
        Graphics                    : 745 MHz
        Memory                      : 3004 MHz
    Default Applications Clocks
        Graphics                    : 745 MHz
        Memory                      : 3004 MHz
    Max Clocks
        Graphics                    : 875 MHz
        SM                          : 875 MHz
        Memory                      : 3004 MHz
    SM Clock Samples
        Duration                    : 0.00 sec
        Number of Samples           : 2
        Max                         : 745 MHz
        Min                         : 324 MHz
        Avg                         : 745 MHz
    Memory Clock Samples
        Duration                    : 0.00 sec
        Number of Samples           : 2
        Max                         : 3004 MHz
        Min                         : 324 MHz
        Avg                         : 3004 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
~$ nvidia-smi -q -d SUPPORTED_CLOCKS

==============NVSMI LOG==============

Timestamp                           : Mon Oct 19 13:43:10 2015
Driver Version                      : 352.41

Attached GPUs                       : 2
GPU 0000:03:00.0
    Supported Clocks
        Memory                      : 3004 MHz
            Graphics                : 875 MHz
            Graphics                : 771 MHz
            Graphics                : 666 MHz
            Graphics                : 549 MHz
        Memory                      : 810 MHz
            Graphics                : 549 MHz
        Memory                      : 324 MHz
            Graphics                : 324 MHz

GPU 0000:04:00.0
    Supported Clocks
        Memory                      : 3004 MHz
            Graphics                : 875 MHz
            Graphics                : 810 MHz
            Graphics                : 745 MHz
            Graphics                : 666 MHz
        Memory                      : 324 MHz
            Graphics                : 324 MHz
sudo nvidia-smi -ac 3004,875 -i 0
Setting applications clocks is not supported for GPU 0000:03:00.0.
Treating as warning and moving on.
All done.

Thanks.
Hasan

This is all expected behavior.

Setting application clocks is not supported on your K5200 GPU.

Apart from that, CUDA applications that only use a single GPU will generally default to using a particular GPU in your system. If you want to “steer” an application to use another GPU, you could try using the CUDA_VISIBLE_DEVICES environment variable.

[url]Programming Guide :: CUDA Toolkit Documentation

Hi txbob,

Thanks for your reply. So I set GPU 0 as my only visible device by changing the corresponding environment variable as follows -

export CUDA_VISIBLE_DEVICES=0

Still no utilization of GPU 0, computation directly goes to GPU 1.

nvidia-smi
Mon Oct 19 14:41:01 2015
+------------------------------------------------------+
| NVIDIA-SMI 352.41     Driver Version: 352.41         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K5200        Off  | 0000:03:00.0     Off |                    0 |
| 31%   50C    P8    21W / 150W |    107MiB /  7678MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K40c          Off  | 0000:04:00.0     Off |                    0 |
| 33%   73C    P0   130W / 235W |   1606MiB / 11519MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      2799    C   /home/mhasan/torch/install/bin/luajit           89MiB |
|    1      2743    C   .../mhasan/_caffe/.build_release/tools/caffe  1456MiB |
|    1      2799    C   /home/mhasan/torch/install/bin/luajit          121MiB |
+-----------------------------------------------------------------------------+

Try doing

export CUDA_VISIBLE_DEVICES="1"

The enumeration order in nvidia-smi is not always the same as the CUDA enumeration order.

CUDA tries to order the most powerful GPU first. That would be the K40c before the K5200

Thanks a lot!
Both of the GPUs are now doing computations.

nvidia-smi
Mon Oct 19 15:00:02 2015
+------------------------------------------------------+
| NVIDIA-SMI 352.41     Driver Version: 352.41         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K5200        Off  | 0000:03:00.0     Off |                    0 |
| 34%   59C    P0    73W / 150W |    125MiB /  7678MiB |     88%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K40c          Off  | 0000:04:00.0     Off |                    0 |
| 33%   72C    P0   129W / 235W |   1482MiB / 11519MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      2987    C   /home/mhasan/torch/install/bin/luajit          108MiB |
|    1      2743    C   .../mhasan/_caffe/.build_release/tools/caffe  1456MiB |
+-----------------------------------------------------------------------------+