How to monitor cards activity (memory, GPU load..)

Hi!

In our laboratory we have 3 Tesla cards plus one Quadro on a single machine. I have to run scientific cuda-enabled codes written by other persons, and i would like to know what cards are used and gpu and memory loads when i run theese codes. I’ll need to do this in a command-line environment. Also, I was wondering if there is a way to select cards to use at runtime.

Thankyou!
Stefano.

This script will work for the Teslas:

[cuda@teslacluster ~]$ cat available_gpu
#!bin/bash

Find out how many Teslas are connected to the host

N=/sbin/lspci |grep -i NVIDIA |grep "3D controller" |wc -l
N=expr $N - 1

for i in seq 0 $N;do
PROC=/usr/sbin/lsof /dev/nvidia$i |grep mem |awk '{print $4}'
if [ “$PROC” = “mem” ];then
echo card $i in use
else
echo card $i is available
fi
done

You may want to look at the exclusive mode in CUDA 2.2, the driver will automatically select an available device for you.

Thanks, Mfatica!

This script just shows which card is in use.

However I just want to measure the memory and GPU usage, like TOP for CPU.

is there any tools to monitor the GPU process and memory?

You can check the memory usage if not in exclusive mode, I posted a piece of code on the forum some time ago.

We are considering a TOP for GPU.

[nas@nas ~]$ nvidia-smi -a

==============NVSMI LOG==============

Timestamp : Sun Oct 31 11:16:16 2010

Driver Version : 260.19.12

GPU 0:
Product Name : GeForce 9400 GT
PCI Device/Vendor ID : 64110de
PCI Location ID : 0:2:0
Display : Connected
Temperature : 47 C
Fan Speed : 100%
Utilization
GPU : 51%
Memory : 5%
[nas@nas ~]$

[nas@nas ~]$ nvidia-smi -a

==============NVSMI LOG==============

Timestamp : Sun Oct 31 11:16:16 2010

Driver Version : 260.19.12

GPU 0:
Product Name : GeForce 9400 GT
PCI Device/Vendor ID : 64110de
PCI Location ID : 0:2:0
Display : Connected
Temperature : 47 C
Fan Speed : 100%
Utilization
GPU : 51%
Memory : 5%
[nas@nas ~]$