How to monitor cards activity (memory, GPU load..)

stefano.russo · May 19, 2009, 2:17pm

Hi!

In our laboratory we have 3 Tesla cards plus one Quadro on a single machine. I have to run scientific cuda-enabled codes written by other persons, and i would like to know what cards are used and gpu and memory loads when i run theese codes. I’ll need to do this in a command-line environment. Also, I was wondering if there is a way to select cards to use at runtime.

Thankyou!
Stefano.

mfatica · May 19, 2009, 3:34pm

This script will work for the Teslas:

[cuda@teslacluster ~]$ cat available_gpu
#!bin/bash

Find out how many Teslas are connected to the host

N=/sbin/lspci |grep -i NVIDIA |grep "3D controller" |wc -l
N=expr $N - 1

for i in seq 0 $N;do
PROC=/usr/sbin/lsof /dev/nvidia$i |grep mem |awk '{print $4}'
if [ “$PROC” = “mem” ];then
echo card $i in use
else
echo card $i is available
fi
done

You may want to look at the exclusive mode in CUDA 2.2, the driver will automatically select an available device for you.

pigguo · May 27, 2009, 3:29pm

Thanks, Mfatica!

This script just shows which card is in use.

However I just want to measure the memory and GPU usage, like TOP for CPU.

is there any tools to monitor the GPU process and memory?

This script will work for the Teslas:

[cuda@teslacluster ~]$ cat available_gpu

#!bin/bash

Find out how many Teslas are connected to the host

N=/sbin/lspci |grep -i NVIDIA |grep "3D controller" |wc -l

N=expr $N - 1

for i in seq 0 $N;do

PROC=/usr/sbin/lsof /dev/nvidia$i |grep mem |awk '{print $4}'

if [ “$PROC” = “mem” ];then
echo card $i in use
else
echo card $i is available
fi

done

You may want to look at the exclusive mode in CUDA 2.2, the driver will automatically select an available device for you.

mfatica · May 27, 2009, 3:44pm

You can check the memory usage if not in exclusive mode, I posted a piece of code on the forum some time ago.

We are considering a TOP for GPU.

grozoursbzh · October 31, 2010, 9:20am

[nas@nas ~]$ nvidia-smi -a

==============NVSMI LOG==============

Timestamp : Sun Oct 31 11:16:16 2010

Driver Version : 260.19.12

GPU 0:
Product Name : GeForce 9400 GT
PCI Device/Vendor ID : 64110de
PCI Location ID : 0:2:0
Display : Connected
Temperature : 47 C
Fan Speed : 100%
Utilization
GPU : 51%
Memory : 5%
[nas@nas ~]$

grozoursbzh · October 31, 2010, 9:20am