Nvidia-smi isn't logging GPU usage when MPS is enabled

I’m trying to run two parallel tensorflow process on single GPU using nvidia-mps-server. It works as intented on Tesla T4 GPU (volta) giving 20ms inference time for two process with mps and ~26ms for without mps server. When I’m running it on GTX 1060 device (pre-volta), the inference time decreases. MPS works better with volta gpus, but my issue is nvidia-smi doesn’t seems to giving correct inference times.

I’m setting up memory fraction for each process and program is running, giving ~27ms inference time (which is similar whether I run with or without mps) on a video stream on ssd-inception-v2 model. But in the nvidia-smi output doesn’t show correct gpu usage during inference. Here’s my code output:

frame no.: 798 PID: 584 Received data True system time 12:46:21.014076 
frame no.: 798 PID: 583 Received data True system time 12:46:21.014130 
PID 584 Model-fraction 0.3 sys-time 12:46:21.042925 Inference-time 
0.0278 loop-time 0.0421 elapsed_time 34.5984 frame_no 798 fps 23.0646 
NumDetec 0.0 CPU 54.2

PID 583 Model-fraction 0.3 sys-time 12:46:21.043024 Inference-time 
0.0281 loop-time 0.0423 elapsed_time 34.5887 frame_no 798 fps 23.0711 
NumDetec 0.0 CPU 54.8

Here’s nvidia-smi output during the process run.

https://i.stack.imgur.com/MguQh.png

At the start of python program, it shows cuda device not found error but gives 26ms inference time which means GPU is being used, even if its not showing in nvidia-smi.

2019-08-21 12:45:46.432372: I 
tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports 
instructions that this TensorFlow binary was not compiled to use: AVX2 
AVX512F FMA
2019-08-21 12:45:46.434036: E 
tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to 
cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2019-08-21 12:45:46.434075: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving 
CUDA diagnostic information for host: user
2019-08-21 12:45:46.434083: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 
user-**********
2019-08-21 12:45:46.434178: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda 
reported version is: 430.26.0
2019-08-21 12:45:46.434203: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel 
reported version is: 430.26.0
2019-08-21 12:45:46.434210: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version 
seems to match DSO: 430.26.0
2019-08-21 12:45:46.439039: I 
tensorflow/stream_executor/platform/default/dso_loader.cc:42] 
Successfully opened dynamic library libcuda.so.1
2019-08-21 12:45:46.441147: E 
tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to 
cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2019-08-21 12:45:46.441182: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving 
CUDA diagnostic information for host: user-**********
2019-08-21 12:45:46.441192: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 
user-********
2019-08-21 12:45:46.441273: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda 
reported version is: 430.26.0
2019-08-21 12:45:46.441301: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel 
reported version is: 430.26.0
2019-08-21 12:45:46.441309: I 
tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version 
seems to match DSO: 430.26.0
2019-08-21 12:45:46.443156: I 
tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 
3300000000 Hz
2019-08-21 12:45:46.443822: I 
tensorflow/compiler/xla/service/service.cc:168] XLA service 0x85e69e0 
executing computations on platform Host. Devices:
2019-08-21 12:45:46.443847: I 
tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device 
(0): <undefined>, <undefined>
Graph created
2019-08-21 12:45:46.452790: I 
tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 
3300000000 Hz
2019-08-21 12:45:46.453671: I 
tensorflow/compiler/xla/service/service.cc:168] XLA service 0x85e6500 
executing computations on platform Host. Devices:
2019-08-21 12:45:46.453693: I 
tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device 
(0): <undefined>, <undefined>
Graph created

Any insights on why GPU usage isn’t recorded by nvidia-smi, or do I need to make changes in my python code for MPS to work properly in pre-volta GPU

I believe this is pretty much expected behavior.

Tensorflow uses CUDA stream callbacks. CUDA stream callbacks are supported under MPS with a Volta client, but not with a pre-volta client.

So TF won’t run on a pre-volta GPU with CUDA MPS. Which is exactly what you are seeing.

If you do a bit of googling on “Tensorflow CUDA MPS” I think you’ll find the same information.

I’m not going to try to address your claim that the GTX 1060 case “must be using the GPU”. Its evident from the TF output that it is not.

Hey Robert,
Thanks for the reply. Regarding GPU usage, it was just an observation, since when I set CUDA_VISIBLE_DEVICE=’’" (none), the inference time is always over 150ms per image. Even the tensorflow benchmarks https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#coco-trained-models provide 40ms inference time. (Its trained on K80 hence on newer GPUs we get better inference time)