No running processes found by NVIDIA Tesla P100, what could be the cause?

workthatgpu · July 3, 2018, 9:31pm

I am logging into a remote server with 4 GPUs installed. I tried rebooting the server but $ nvidia-smi gives the same output as shown below.
I am not able to find other similar issues online. So I am not sure what to aim to fix the problem. Any help is appreciated!

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.67                 Driver Version: 390.67                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:04:00.0 Off |                    0 |
| N/A   29C    P0    24W / 250W |      0MiB / 12198MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  On   | 00000000:05:00.0 Off |                    0 |
| N/A   30C    P0    24W / 250W |      0MiB / 12198MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-PCIE...  On   | 00000000:88:00.0 Off |                    0 |
| N/A   27C    P0    24W / 250W |      0MiB / 12198MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-PCIE...  On   | 00000000:89:00.0 Off |                    0 |
| N/A   30C    P0    25W / 250W |      0MiB / 12198MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

cbuchner1 · July 3, 2018, 10:09pm

Unless you are absolutely certain that there is a known workload on these GPUs, it is safe to assume that they are idling. No memory used, no compute utilization, no running compute processes. They are also really cold (<= 30 deg C)

workthatgpu · July 3, 2018, 10:27pm

So how would I activate them? Is there any command for me to activate them remotely or does it require people at the remote location to activate the hardware?

zjw518 · July 3, 2018, 10:39pm

You don’t “activate” a GPU. You can clearly see they are powered on. If you aren’t running an application on it, then you won’t see any running processes.

workthatgpu · July 3, 2018, 11:01pm

But I tried to run an application on it, the error indicates that no GPU is being activated:

mxnet.base.MXNetError: [16:00:41] src/engine/threaded_engine.cc:318: Check failed: device_count_ > 0 (-1 vs. 0) GPU usage requires at least 1 GPU

Robert_Crovella · July 3, 2018, 11:27pm

verify the CUDA installation
instructions are in the relevant install guide

cbuchner1 · July 3, 2018, 11:31pm

Maybe people in the Mxnet specific mailing list or slack channel have encountered this error before and can help you.

To me it appears that a dependency (driver or CUDA runtime) might not be met.

github.com

apache/incubator-mxnet/blob/master/src/engine/threaded_engine.cc

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 */

/*!

This file has been truncated. show original

The offending piece of code is found here in line 317, and it appears that cudaGetDeviceCount() returns with an error (the default device count value of -1 remains in variable!)

workthatgpu · July 3, 2018, 11:53pm

Thank you all for your advice! I have found out that I need to install CuDNN still. I skipped it initially because I need to ssh into the GPU server. I am working on a way around this issue. Will keep you all updated.

schopra1978 · May 3, 2019, 9:38am

Hi workthatgpu,
Were you able to fix this issue. I’ve installed cuDNN and am still facing this issue.
Any suuport will be highly appriciated.

Thanks in advance buddy

Topic		Replies	Views
No GUI after install the Nvidia tesla V100 CUDA Setup and Installation	2	1098	April 3, 2018
No GUI after install the Nvidia tesla V100 Linux	1	728	April 3, 2018
Tesla P100 Issue – Processing Stops at 8MiB, Multiple Driver Versions Tested nvc, nvc++ and nvfortran cuda	9	90	December 19, 2024
GPUs temporary disappear during runtime (driver 384.59) CUDA Setup and Installation	1	974	September 1, 2017
No GUI after install the Nvidia tesla V100 DRIVE - Linux	1	1101	April 3, 2018
Power-9 (ppc64le) - Cuda9.2 - Nvidia driver failures Linux	5	673	December 24, 2018
K20 with high utilization, but no compute processes. CUDA Setup and Installation	12	26590	March 19, 2015
cuda (375.66) is failing with uknown error 30 after suspending Ubuntu 16.04 Linux	3	1672	September 5, 2017
No running process found - GPU UBUNTU 16.04.6 CUDA Developer Tools	0	1061	July 5, 2020
Need Help with P100 installation (R730 Dell) CUDA Setup and Installation	8	1651	August 18, 2023

No running processes found by NVIDIA Tesla P100, what could be the cause?

Related topics