Nvidia-smi isn't logging GPU usage when MPS is enabled

riteshweb2 · August 21, 2019, 2:05pm

I’m trying to run two parallel tensorflow process on single GPU using nvidia-mps-server. It works as intented on Tesla T4 GPU (volta) giving 20ms inference time for two process with mps and ~26ms for without mps server. When I’m running it on GTX 1060 device (pre-volta), the inference time decreases. MPS works better with volta gpus, but my issue is nvidia-smi doesn’t seems to giving correct inference times.

I’m setting up memory fraction for each process and program is running, giving ~27ms inference time (which is similar whether I run with or without mps) on a video stream on ssd-inception-v2 model. But in the nvidia-smi output doesn’t show correct gpu usage during inference. Here’s my code output:

frame no.: 798 PID: 584 Received data True system time 12:46:21.014076 
frame no.: 798 PID: 583 Received data True system time 12:46:21.014130 
PID 584 Model-fraction 0.3 sys-time 12:46:21.042925 Inference-time 
0.0278 loop-time 0.0421 elapsed_time 34.5984 frame_no 798 fps 23.0646 
NumDetec 0.0 CPU 54.2

PID 583 Model-fraction 0.3 sys-time 12:46:21.043024 Inference-time 
0.0281 loop-time 0.0423 elapsed_time 34.5887 frame_no 798 fps 23.0711 
NumDetec 0.0 CPU 54.8

Here’s nvidia-smi output during the process run.

https://i.stack.imgur.com/MguQh.png

At the start of python program, it shows cuda device not found error but gives 26ms inference time which means GPU is being used, even if its not showing in nvidia-smi.

2019-08-21 12:45:46.432372: I 
tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports 
instructions that this TensorFlow binary was not compiled to use: AVX2 
AVX512F FMA
2019-08-21 12:45:46.434036: E 
tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to 
cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2019-08-21 12:45:46.434075: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving 
CUDA diagnostic information for host: user
2019-08-21 12:45:46.434083: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 
user-**********
2019-08-21 12:45:46.434178: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda 
reported version is: 430.26.0
2019-08-21 12:45:46.434203: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel 
reported version is: 430.26.0
2019-08-21 12:45:46.434210: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version 
seems to match DSO: 430.26.0
2019-08-21 12:45:46.439039: I 
tensorflow/stream_executor/platform/default/dso_loader.cc:42] 
Successfully opened dynamic library libcuda.so.1
2019-08-21 12:45:46.441147: E 
tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to 
cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2019-08-21 12:45:46.441182: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving 
CUDA diagnostic information for host: user-**********
2019-08-21 12:45:46.441192: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 
user-********
2019-08-21 12:45:46.441273: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda 
reported version is: 430.26.0
2019-08-21 12:45:46.441301: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel 
reported version is: 430.26.0
2019-08-21 12:45:46.441309: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version 
seems to match DSO: 430.26.0
2019-08-21 12:45:46.443156: I 
tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 
3300000000 Hz
2019-08-21 12:45:46.443822: I 
tensorflow/compiler/xla/service/service.cc:168] XLA service 0x85e69e0 
executing computations on platform Host. Devices:
2019-08-21 12:45:46.443847: I 
tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device 
(0): <undefined>, <undefined>
Graph created
2019-08-21 12:45:46.452790: I 
tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 
3300000000 Hz
2019-08-21 12:45:46.453671: I 
tensorflow/compiler/xla/service/service.cc:168] XLA service 0x85e6500 
executing computations on platform Host. Devices:
2019-08-21 12:45:46.453693: I 
tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device 
(0): <undefined>, <undefined>
Graph created

Any insights on why GPU usage isn’t recorded by nvidia-smi, or do I need to make changes in my python code for MPS to work properly in pre-volta GPU

Robert_Crovella · August 21, 2019, 2:20pm

I believe this is pretty much expected behavior.

Tensorflow uses CUDA stream callbacks. CUDA stream callbacks are supported under MPS with a Volta client, but not with a pre-volta client.

So TF won’t run on a pre-volta GPU with CUDA MPS. Which is exactly what you are seeing.

If you do a bit of googling on “Tensorflow CUDA MPS” I think you’ll find the same information.

I’m not going to try to address your claim that the GTX 1060 case “must be using the GPU”. Its evident from the TF output that it is not.

riteshweb2 · August 21, 2019, 2:32pm

Hey Robert,
Thanks for the reply. Regarding GPU usage, it was just an observation, since when I set CUDA_VISIBLE_DEVICE=‘’" (none), the inference time is always over 150ms per image. Even the tensorflow benchmarks [url]https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#coco-trained-models[/url] provide 40ms inference time. (Its trained on K80 hence on newer GPUs we get better inference time)

Topic		Replies	Views
Monitor GPU usage with nvidia-smi Linux	6	5552	October 14, 2021
nvidia-smi and exclusive compute mode Jetson TX2	10	5575	October 18, 2021
nvidia-smi reports 3 GPUs but deviceQuery reports only 2 CUDA Setup and Installation	4	2006	June 23, 2018
GPU utilization DGX User Forum	8	6453	August 21, 2019
Nvidia-smi failed to detect all GPU cards CUDA Setup and Installation	11	13161	December 14, 2018
GPU Performance CUDA Programming and Performance	12	13421	March 5, 2019
GPU Utilization Drops after Consecutive Executions CUDA Programming and Performance	28	5706	October 2, 2013
Nvidia-SMI reporting 0% gpu utilization Drivers - Linux, Windows, MacOS linux , nvidia-smi , linux-driver	2	4181	August 3, 2023
GPU utilization broken in CUDA-4.0 Is patch available? CUDA Programming and Performance	2	2875	August 8, 2011
nvidia-smi is slow and hangs after sometime with 1080Ti CUDA Setup and Installation	4	6690	June 20, 2018

Nvidia-smi isn't logging GPU usage when MPS is enabled

Related topics