I’m trying to figure out how I can monitor the utilization of GPUs running CUDA kernels. Ideally, I’d like to measure on an on-going basis how idle or busy the GPUs are, something akin to “top”.
At the least, I’d like some way of measuring when kernels start and stop running.
The goal is to monitor performance on a production compute farm, so I don’t have the luxury of (1) running code under profiling, (2) forcing developers to use a wrapper API.
Would it be possible to hack the NVIDIA driver to trap kernel launch and completion events?
I’d greatly appreciate any suggestions! Thanks :-)
I am running a farm of 12 PCs each having two 8800GTXs running CUDA processing applications under Windows XP and would dearly like to measure the performance / utilisation of each GPU in order to optimize the software.
Coming in an upcoming release, but don’t confuse GPU utilization statistics with the results of a profiler. GPU utilization statistics will not necessarily enable you to improve performance.
Now that 3.1 is out, do you have any update on the timing of this utility? Basically, we’re using an S1070 in a multiuser environment and I’d like to be able to tell if anyone else is using it. As the original poster said, something akin to ‘top’ for the gpu.