nvidia-smi and PGI

Hello,

My server has 2 Nvidia GPUs (GTX1080 and GTX680). With nvidia-smi, I have configured “Compute mode” to “Prohibited” to disablie GPU use in interactive mode (GPUs are available if user submits a SGE script to the GPU queue).

If a user tries to execute a CUDA program, GPU returns an error advising that GPU is in “Prohibited mode”. However, if a user executes a PGI program, it runs OK although “Prohibited mode” is enabled.

Could I block and disable that execution from any PGI tool or could I check before execution? (something similar to “nvidia-smi -q -d COMPUTE”)

Thanks.

Hi CAOS-SysAdmin,

PGI uses the same CUDA runtime and driver as your CUDA programs so it’s unclear why this would occur. I tried setting “PROHIBITTED” here on my GTX690 system, and the binary failed to execute as expected.

Could there be something else going on such as something is misconfigured when using PGI? Or possibly an issue with your driver?

-Mat

% nvidia-smi
Thu Dec 14 11:00:59 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     On   | 0000:03:00.0     N/A |                  N/A |
| 30%   31C    P8    N/A /  N/A |      1MiB /  1999MiB |     N/A   Prohibited |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 690     On   | 0000:04:00.0     N/A |                  N/A |
| 30%   30C    P8    N/A /  N/A |     24MiB /  1998MiB |     N/A   Prohibited |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
|    1                  Not Supported                                         |
+-----------------------------------------------------------------------------+
% ./md_oacc_base.pgi
... cut ...
--------------------------------------------------------
  Reading coordinates and velocities from md.in
call to cuDevicePrimaryCtxRetain returned error 101: Invalid device

My system is:

Fri Dec 15 09:46:34 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 680     Off  | 00000000:02:00.0 N/A |                  N/A |
| 40%   44C    P0    N/A /  N/A |      0MiB /  1996MiB |     N/A   Prohibited |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
|  0%   43C    P5    29W / 250W |      0MiB / 11172MiB |      0%   Prohibited |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

if I run bandwitch CUDA-9.0 sample, the output is:

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 1080 Ti
Error: device is running in <Compute Mode Prohibited>, no threads can use ::cudaSetDevice().

However, if I run this example (compiled with “pgcc -acc -fast -Minfo heat_basic.c -o heat_basic_test”), it runs OK:

[...]
Pointers ITER: 996 temp1 22372320  temp3 22289056  temp_tmp 22289056
Pointers ITER: 997 temp1 22289056  temp3 22372320  temp_tmp 22372320
Pointers ITER: 998 temp1 22372320  temp3 22289056  temp_tmp 22289056
Pointers ITER: 999 temp1 22289056  temp3 22372320  temp_tmp 22372320
Time for computing on GPU: 1.20 s
Checking results
OpenACC temperature test was successful!

It seems example runs OK, so “Prohibited” compute mode is allowing executions…

Could you send me your example md_oacc_base.pgi?

Thanks.

By default with “-acc”, the compiler targets both the CPU and GPU (-ta=host,tesla). If no GPU is available, the binary will still run, but just on the host. I think that’s what’s happening here.

Try compiling with “-ta=tesla:cc60,cc35,cc30”. This will create a binary that only will run on NVIDIA devices.

-Mat

Thanks a lot!!!