No running processes found by NVIDIA Tesla P100, what could be the cause?

I am logging into a remote server with 4 GPUs installed. I tried rebooting the server but $ nvidia-smi gives the same output as shown below.
I am not able to find other similar issues online. So I am not sure what to aim to fix the problem. Any help is appreciated!

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.67                 Driver Version: 390.67                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:04:00.0 Off |                    0 |
| N/A   29C    P0    24W / 250W |      0MiB / 12198MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  On   | 00000000:05:00.0 Off |                    0 |
| N/A   30C    P0    24W / 250W |      0MiB / 12198MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-PCIE...  On   | 00000000:88:00.0 Off |                    0 |
| N/A   27C    P0    24W / 250W |      0MiB / 12198MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-PCIE...  On   | 00000000:89:00.0 Off |                    0 |
| N/A   30C    P0    25W / 250W |      0MiB / 12198MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Unless you are absolutely certain that there is a known workload on these GPUs, it is safe to assume that they are idling. No memory used, no compute utilization, no running compute processes. They are also really cold (<= 30 deg C)

So how would I activate them? Is there any command for me to activate them remotely or does it require people at the remote location to activate the hardware?

You don’t “activate” a GPU. You can clearly see they are powered on. If you aren’t running an application on it, then you won’t see any running processes.

But I tried to run an application on it, the error indicates that no GPU is being activated:

mxnet.base.MXNetError: [16:00:41] src/engine/threaded_engine.cc:318: Check failed: device_count_ > 0 (-1 vs. 0) GPU usage requires at least 1 GPU

verify the CUDA installation
instructions are in the relevant install guide

Maybe people in the Mxnet specific mailing list or slack channel have encountered this error before and can help you.

https://mxnet.incubator.apache.org/community/mxnet_channels.html

To me it appears that a dependency (driver or CUDA runtime) might not be met.

The offending piece of code is found here in line 317, and it appears that cudaGetDeviceCount() returns with an error (the default device count value of -1 remains in variable!)

Thank you all for your advice! I have found out that I need to install CuDNN still. I skipped it initially because I need to ssh into the GPU server. I am working on a way around this issue. Will keep you all updated.

Hi workthatgpu,
Were you able to fix this issue. I’ve installed cuDNN and am still facing this issue.
Any suuport will be highly appriciated.

Thanks in advance buddy