Monitor k80

josephb99kd · December 6, 2017, 4:01pm

Hi, we have a number of ubuntu servers (minimum install) with k80’s. We’re looking to monitor the GPU on those server with any monitor tool available (e.g. nagios, prtg, logicmonitor etc.) using whatever means such as SNMP. I was looking for MIB’s for the k80 but could not find any. We need to have historical data on the temp, utilization, run reports and see graphs. And of course, setup alerts. What are my options grab that data from ubuntu? Im by no means a developer, i just need this info from an IT standpoint.

Thanks!

generix · December 6, 2017, 5:04pm

Use nvidia-smi for that, look at the --loop and --filename or daemon options.

josephb99kd · December 6, 2017, 5:41pm

Hey, thanks for the info.

nvidia-smi is great if im monitoring the box live. The --filename option is definitely an option to store historical data, but if only it was csv or some other meaningful delimiter. The way its now, i would need to perform some extensive parsing to make use of the data.

Is there anyone thats currently monitoring their GPUs? Im sure there’s a great need for this. I need to get the data into some tool where i could configure graphs and setup alerts.

generix · December 6, 2017, 6:41pm

Though you said ‘no means developer’ I think your best shot is to write a small python script that uses the nvidia-ml-py bindings. Shouldn’t be too hard. Also available for perl and of course most other languages.
See: NVML (Nvidia management library)
PS: you can also use the -x switch of nvidia-smi to get xml but a ton of it and still needs to be parsed.

Topic		Replies	Views
NVAPI support for Linux? NVAPI	1	5776	April 18, 2014
Nvidia Mib BlueField	1	1133	May 4, 2023
Support hwmon for GPU monitoring Linux monitoring	5	128	August 31, 2024
Low level availability of GPU information Linux	0	423	March 25, 2022
NVIDIA-SMI: Great starting point for monitoring GPU... Monitoring/Assessment Tools	2	20249	April 4, 2014
Nvidia-SMI file location on Linux Monitoring/Assessment Tools	0	1722	February 25, 2021
Nvidia-smi for MIG technology CUDA Setup and Installation nvidia-smi , gpu-computing	2	27	November 11, 2024
Shell script & "nvidia-smi" - needs right command/flag! CUDA Programming and Performance	5	9592	June 28, 2015
Is there any tool on linux to monitor temperature of GPU memory? Linux	0	349	May 12, 2022
A dashboard for GPU card CUDA Programming and Performance	0	729	July 10, 2018

Monitor k80

Related topics