nvidia-smi and PGI

CAOS-SysAdmin · December 14, 2017, 1:25pm

Hello,

My server has 2 Nvidia GPUs (GTX1080 and GTX680). With nvidia-smi, I have configured “Compute mode” to “Prohibited” to disablie GPU use in interactive mode (GPUs are available if user submits a SGE script to the GPU queue).

If a user tries to execute a CUDA program, GPU returns an error advising that GPU is in “Prohibited mode”. However, if a user executes a PGI program, it runs OK although “Prohibited mode” is enabled.

Could I block and disable that execution from any PGI tool or could I check before execution? (something similar to “nvidia-smi -q -d COMPUTE”)

Thanks.

MatColgrove · December 14, 2017, 7:03pm

Hi CAOS-SysAdmin,

PGI uses the same CUDA runtime and driver as your CUDA programs so it’s unclear why this would occur. I tried setting “PROHIBITTED” here on my GTX690 system, and the binary failed to execute as expected.

Could there be something else going on such as something is misconfigured when using PGI? Or possibly an issue with your driver?

-Mat

% nvidia-smi
Thu Dec 14 11:00:59 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 690     On   | 0000:03:00.0     N/A |                  N/A |
| 30%   31C    P8    N/A /  N/A |      1MiB /  1999MiB |     N/A   Prohibited |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 690     On   | 0000:04:00.0     N/A |                  N/A |
| 30%   30C    P8    N/A /  N/A |     24MiB /  1998MiB |     N/A   Prohibited |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
|    1                  Not Supported                                         |
+-----------------------------------------------------------------------------+
% ./md_oacc_base.pgi
... cut ...
--------------------------------------------------------
  Reading coordinates and velocities from md.in
call to cuDevicePrimaryCtxRetain returned error 101: Invalid device

CAOS-SysAdmin · December 15, 2017, 9:06am

My system is:

Fri Dec 15 09:46:34 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 680     Off  | 00000000:02:00.0 N/A |                  N/A |
| 40%   44C    P0    N/A /  N/A |      0MiB /  1996MiB |     N/A   Prohibited |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
|  0%   43C    P5    29W / 250W |      0MiB / 11172MiB |      0%   Prohibited |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

if I run bandwitch CUDA-9.0 sample, the output is:

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 1080 Ti
Error: device is running in <Compute Mode Prohibited>, no threads can use ::cudaSetDevice().

However, if I run this example (compiled with “pgcc -acc -fast -Minfo heat_basic.c -o heat_basic_test”), it runs OK:

[...]
Pointers ITER: 996 temp1 22372320  temp3 22289056  temp_tmp 22289056
Pointers ITER: 997 temp1 22289056  temp3 22372320  temp_tmp 22372320
Pointers ITER: 998 temp1 22372320  temp3 22289056  temp_tmp 22289056
Pointers ITER: 999 temp1 22289056  temp3 22372320  temp_tmp 22372320
Time for computing on GPU: 1.20 s
Checking results
OpenACC temperature test was successful!

It seems example runs OK, so “Prohibited” compute mode is allowing executions…

Could you send me your example md_oacc_base.pgi?

Thanks.

MatColgrove · December 15, 2017, 4:23pm

By default with “-acc”, the compiler targets both the CPU and GPU (-ta=host,tesla). If no GPU is available, the binary will still run, but just on the host. I think that’s what’s happening here.

Try compiling with “-ta=tesla:cc60,cc35,cc30”. This will create a binary that only will run on NVIDIA devices.

-Mat

CAOS-SysAdmin · December 19, 2017, 2:24pm

Thanks a lot!!!

Topic		Replies	Views
nvidia-smi shows "insufficient permissions" for GPU processes on Windows CUDA Setup and Installation	1	11768	February 25, 2015
Compute Mode "Prohibited" (GRID M60) NVIDIA Virtual GPU Technology	3	9793	September 8, 2017
nvidia-smi and exclusive compute mode Legacy PGI Compilers	4	18772	April 27, 2010
per-process resource accounting CUDA Programming and Performance	2	2678	December 22, 2022
Nvidia-smi product name error and no cuda capable device CUDA Setup and Installation	0	1334	April 17, 2020
Two GPUs, but 2nd GPU not detected. How to fix? CUDA Setup and Installation	10	15357	January 21, 2018
Error when nvidia-smi command is executed CUDA Programming and Performance	3	2316	May 22, 2018
Nvidia-SMI reporting 0% gpu utilization Drivers - Linux, Windows, MacOS linux , nvidia-smi , linux-driver	2	4090	August 3, 2023
Nvidia-smi failed to detect all GPU cards CUDA Setup and Installation	11	13102	December 14, 2018
GPU Performance CUDA Programming and Performance	12	13394	March 5, 2019

nvidia-smi and PGI

Related topics