Hello,
My server has 2 Nvidia GPUs (GTX1080 and GTX680). With nvidia-smi, I have configured “Compute mode” to “Prohibited” to disablie GPU use in interactive mode (GPUs are available if user submits a SGE script to the GPU queue).
If a user tries to execute a CUDA program, GPU returns an error advising that GPU is in “Prohibited mode”. However, if a user executes a PGI program, it runs OK although “Prohibited mode” is enabled.
Could I block and disable that execution from any PGI tool or could I check before execution? (something similar to “nvidia-smi -q -d COMPUTE”)
Thanks.
Hi CAOS-SysAdmin,
PGI uses the same CUDA runtime and driver as your CUDA programs so it’s unclear why this would occur. I tried setting “PROHIBITTED” here on my GTX690 system, and the binary failed to execute as expected.
Could there be something else going on such as something is misconfigured when using PGI? Or possibly an issue with your driver?
-Mat
% nvidia-smi
Thu Dec 14 11:00:59 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39 Driver Version: 375.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 690 On | 0000:03:00.0 N/A | N/A |
| 30% 31C P8 N/A / N/A | 1MiB / 1999MiB | N/A Prohibited |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 690 On | 0000:04:00.0 N/A | N/A |
| 30% 30C P8 N/A / N/A | 24MiB / 1998MiB | N/A Prohibited |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
+-----------------------------------------------------------------------------+
% ./md_oacc_base.pgi
... cut ...
--------------------------------------------------------
Reading coordinates and velocities from md.in
call to cuDevicePrimaryCtxRetain returned error 101: Invalid device
My system is:
Fri Dec 15 09:46:34 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 680 Off | 00000000:02:00.0 N/A | N/A |
| 40% 44C P0 N/A / N/A | 0MiB / 1996MiB | N/A Prohibited |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 0% 43C P5 29W / 250W | 0MiB / 11172MiB | 0% Prohibited |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
if I run bandwitch CUDA-9.0 sample, the output is:
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce GTX 1080 Ti
Error: device is running in <Compute Mode Prohibited>, no threads can use ::cudaSetDevice().
However, if I run this example (compiled with “pgcc -acc -fast -Minfo heat_basic.c -o heat_basic_test”), it runs OK:
[...]
Pointers ITER: 996 temp1 22372320 temp3 22289056 temp_tmp 22289056
Pointers ITER: 997 temp1 22289056 temp3 22372320 temp_tmp 22372320
Pointers ITER: 998 temp1 22372320 temp3 22289056 temp_tmp 22289056
Pointers ITER: 999 temp1 22289056 temp3 22372320 temp_tmp 22372320
Time for computing on GPU: 1.20 s
Checking results
OpenACC temperature test was successful!
It seems example runs OK, so “Prohibited” compute mode is allowing executions…
Could you send me your example md_oacc_base.pgi?
Thanks.
By default with “-acc”, the compiler targets both the CPU and GPU (-ta=host,tesla). If no GPU is available, the binary will still run, but just on the host. I think that’s what’s happening here.
Try compiling with “-ta=tesla:cc60,cc35,cc30”. This will create a binary that only will run on NVIDIA devices.
-Mat