Is there a Tesla GPU load monitoring tool for Linux?

Sasha_Buzko · December 10, 2009, 10:49pm

Hi all,
Is there a reliable utility (presumably, developed by Nvidia) that would give GPU utilization statistics for Tesla cards?

Here’s the reason for the question: we have a couple of Tesla S1070 units hooked up to a host system for high-speed simulations. In the future, we might upgrade to more of parallel S1070/S2070 units with a high-speed interconnect. Without knowing GPU utilization, there would be no direct way to know the bottlenecks in the setup (is the interconnect fast enough to keep the GPUs fully occupied?).

Without a direct measure of GPU load, any kind of scale-up in a cluster environment is pretty much guesswork… Hope, there is a positive answer somewhere…
Thanks for any tips.

Sasha

MisterAnderson42 · December 10, 2009, 11:15pm

No, there is no such tool. Guys at NVIDIA (Tim) have said more than once that they would have provided one already if it were easy…

Here’s hoping that Fermi makes it easy and we will get one for that architecture.

For your particular use-case, I would think that application level benchmarks would be much more information rich than simply monitoring the GPU load, anyways.

tmurray · December 10, 2009, 11:19pm

“fully occupied” is a funny thing when you’re talking about GPUs, and everybody has a different meaning. I don’t know that there’s a magic bullet there.

however, as far as additional tools go, I am well aware that it’s hard to tell what you’re trying to figure out and am working to improve that.

Sasha_Buzko · December 11, 2009, 12:19am

Thank you for the replies, guys.
I guess, we’ll have to resort to benchmarking when the time comes. Although it’s a catch-22 problem, since we need to know the reasonable GPU/host/interconnect combination before investing in it, but we can’t measure performance until we have the hardware… So we are trying to figure out a good way to get an estimate on a smaller scale…

In any event, some kind of a load measurement utility would be very helpful, since there is really no way to get such estimates online (it’s heavily dependent on the particular parallel algorithm)… And I’m sure there will be more questions like this, as GPUs get more into the HPC area.

Thanks again

Sasha

Gregory_Diamos · December 11, 2009, 3:18am

I’m not sure if this would be of any use to you, but a colleague of mine wrote a full system simulator for CUDA applications that records the amount of time that your application spends doing particular operations by intercepting CUDA calls as your program makes them. For example, you can tell how much time you spend in kernels, copying memory, memory allocation, host code, etc. You can also change different system parameters to see how they affect your application. For example, you can increase the PCIe bandwidth/latency, the GPU clock frequency, malloc latency, or make calls synchronous/asynchronous. There are no pretty GUIs or anything like the visual profiler, you would be using a trace driven architecture simulator on the command line. Let me know if you would be interested and I could possibly send you the code.

SPWorley · December 11, 2009, 7:19am

It would be crude and inaccurate and a terrible hack, but you could imagine making a small program that launched a no-op kernel and timed the span between launch and cudaSynchronize(). If it was less than 25 us, there was no kernel running. Otherwise it would give you a delay measuring roughly how long the another (foreign) kernel was busy. Repeat every second or so and you could use some running averages to figure out how often the GPU is busy with other kernels by treating each no-op launch as a point sample of the load on the GPU.

Yes, it’s ugly, yes it gives poor time resolution, yes, it has lots of flaws. But it’d be some feedback, anyway.

tmurray · December 11, 2009, 9:31am

Switching CUDA contexts is not free.

Sasha_Buzko · December 11, 2009, 11:19pm

Thanks, Gregory.
Yes, I would definitely like to take a look at the code, if possible.

Sasha

Gregory_Diamos · December 12, 2009, 12:12am

Hi Sasha,

The code consists of two portions, one is an add-on module to Ocelot ( http://code.google.com/p/gpuocelot/ ) that creates an annotated trace of all of the cuda calls that a program makes. The second part is a trace analysis tool that tries to determine the total execution time of the program that generated the trace using simple timing models.

The first part works stand-alone form the second part, and is actually distributed with Ocelot. You will need to check out the current version from subversion though, as this was recently added and we don’t have an official release that supports it yet. Basically, you want to compile your program with nvcc, and then link it against Ocelot rather than libcudart.so. From that point, you should be able to enable trace generation using a config file and can probably extract a fair amount of information by simply examining the trace.

See this link for installing ocelot: ( http://code.google.com/p/gpuocelot/wiki/Installation )

As for generating a trace, you need a config file in the directory from which you launch your program. An example is given here: ( http://code.google.com/p/gpuocelot/source/…t/config.ocelot ). Change line 21 from CudaRuntimeBase to TraceGeneratingCudaRuntime.

For the actual trace simulator, I’ll send the author an email to see if I can get a copy of the code.

Greg

edit: Also, you need to enable GPU devices in the config file and select a GPU rather than the Ocelot Emulator, otherwise you will get an inflated measurement of kernel execution times.

gshi · December 12, 2009, 4:51pm

This won’t work if the GPU is in exclusive mode.

mozzis · June 12, 2012, 3:50pm

This will:

#!/bin/bash

for ff in `nvidia-smi -L|awk -F ':' '/<img src='http://forums.nvidia.com/public/style_emoticons/<#EMO_DIR#>/blarg.gif' class='bbc_emoticon' alt=':/' />{print $1}'|awk '{print $2}'`; do

  stmem=`nvidia-smi -i $ff -q -d MEMORY`

  stmem=`echo $stmem | awk '/[0-9]+ MB Used/{print $28}'`

  sttem=`nvidia-smi -i $ff -q -d TEMPERATURE | grep Gpu | awk  '{print  $3 $4}'`

  stutl=`nvidia-smi -i $ff -q -d UTILIZATION`

  stutl=`echo $stutl | awk '{print  $23 $24}'`

  echo GPU $ff Memory: $stmem  Mb Temperature: $sttem Utilization: $stutl

done

Topic		Replies	Views
Monitoring GPU Utilization "Top" like utility for GPU CUDA Programming and Performance	8	6459	July 28, 2010
GPU load monitoring tool Now available! CUDA Programming and Performance	21	81522	December 7, 2009
CUDA Resource monitor how to monitor what my program is doing! CUDA Programming and Performance	4	13079	December 29, 2009
checking GPU usage CUDA Programming and Performance	3	24429	March 31, 2008
how to collect GPU statistics ? CUDA Programming and Performance	6	4962	May 18, 2008
How to monitor cards activity (memory, GPU load..) CUDA Programming and Performance	5	9793	October 31, 2010
GPU performance monitoring from host shell CUDA Programming and Performance	1	2600	November 27, 2008
what is CPU and GPU utilization in cuda nsight CUDA Programming and Performance	0	704	January 17, 2015
ps like statement for cuda CUDA Programming and Performance	1	2069	June 24, 2009
Get GPU Usage CUDA Programming and Performance	1	15035	February 3, 2013

Is there a Tesla GPU load monitoring tool for Linux?

Related topics