GPU load monitoring tool Now available!

I’ve seen several posts on the forums here about monitoring GPU load, and I’ve repeatedly asked for the CUDA driver to publish performance counters on Windows (so that you can use standard monitoring tools to read/query the data)…and to no avail.

But now, it seems that the latest version of GPU-Z has added this functionality! Check it out:

http://www.techpowerup.com/downloads/1709/…U-Z_v0.3.8.html

Yep, that doesn’t work with CUDA.

Do you recommend any monitoring tools for linux?
Thanks a lot!

nvidia-smi for monitoring temps. as far as monitoring load, there’s nothing to do that robustly at the moment.

Does it just measure OpenGL or something then? I started it up, watched it say 0% load, then started the Nbody example, the load went to ~95%, then back to 0% when I closed it.

Measures CUDA load just fine on my 8800GT, 9600GSO and GT220 cards

Christian

The counter deals with 3D just fine, but it seems to not do anything useful (or at least reliably) with CUDA. (if it were this easy, pretty sure I’d have released more management applications by now)

i dont run CUDA but thanks for this tip, its fantastic!

really useful to see exactly how much memory is being used, allows you to see what you card can hande and see how AA effects memory and core usage.

if only there was a sidebar/gadget that read from this…

On my 8600 GT, the load stays on 0% whether I run application (nbody, or a game) or not

With the new version of GPU-z, I get a mixed bag of results.

Running Sandra on (1) 295…

The OpenCL Test runs at 58% on 1/2 of my 295, and 19% on the other 1/2.

First 1/2 reporting in at 58% load:

Second 1/2 with 19% load:

I tend to think with OpenCL, it really is using only 1/2 of my 295, and the 19% is just overhead?

With the CUDA Test, 1/2 of my 295 runs at 99%, and the other has intermittent processing issues causing GPU-z to not report it correctly?

First 1/2 reporting 99% load:

Second 1/2 with intermittent processing issues reporting 0% load:

On the Compute Shader Test, I run at 99%, and 0% load.

First 1/2 of my 295 reporting 99% load:

Second 1/2 of my 295, reporting 0% load:

Such a waste of system resources… Makes me cry!

We need the University of Illinois boys, to bring them up to speed on having your CUDA Accelerators auto-detect, and how to successfully have them get added into the CUDA Device Pool. :)

I am wondering if the intermittent processing issues during the CUDA test, hurts our score?

I find it odd that in both the OpenCL, and the Compute Shader test, 1/2 of my 295 beats my 280…

But in the CUDA test, where supposedly both sides of my 295 are being used, does not beat my 280 by more than 2X the speed.

It is also the only test that appears to have an intermittent processing issue, that causes GPU-z to report a 0% load.

This could be related, or CUDA has some scalling issues calculating this test?

I tend to believe it’s that 1/2 of my 295 does not calculate, 100% of the time in this CUDA test.

FYI - My posted benchmarks are here: http://www.xtremesystems.org/forums/showthread.php?t=239932

GPU-z is probably only reporting best it can, with a GPU only running 1/2 the time or so…

UPDATE:

Update: Now more confused than ever…

When testing the 295 on the OpenCL Test, my 280 Dedicated PhysX processor reports a 20% load…

When testing my 295 on the CUDA test, my 280 Dedicated PhysX processor reports varied light utilization.

When testing my 295 on the Compute Shader test, my 280 Dedicated PhysX processor reports mostly heavy utilization.

In summary, that means on the Sandra benchmark app:

When testing a 295 on OpenCL, utilization will be: 58% on 1/2 of my 295, and 19% on the other 1/2, with 20% load on the 280.

When testing a 295 on CUDA, utilization will be: 1/2 of my 295 runs at 99%, and the other 1/2 has intermittent processing issues causing GPU-z to not report correctly, with varied light utilization on the 280.

When testing a 295 on Compute Shaders, utilization will be: 1/2 of my 295 runs at 99%, and the other 1/2 at 0%, with mostly heavy usage on the 280.

Wow! Unexpected, and I have no idea what to think. <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=’:’(’ />

deleted and moved to page 2.

I don’t really know how Sandra works; I’ve seen some very weird numbers from it (very low PCIe bandwidth numbers, for example). It also seems to TDR consistently on my Win7 box for some tests on a C1060, which doesn’t make a whole lot of sense.

I would love to know what is going on, and why my dedicated PhysX GPU is getting some processing action.

Makes me wonder where those calculations would be done if I wasn’t running a PhysX processor with Sandra?

I am starting to think this is the answer:

It just doesn’t sit right with me having Stream report so much higher on the graph, especially when testing a 295 on CUDA, utilization will be: 1/2 of my 295 runs at 99%, and the other 1/2 has intermittent processing issues causing GPU-z to report 0% load, with varied light utilization on my dedicated PhysX processor.

I just get the feeling we are flying on 1 wing.

UPDATE: Another member of EVGA’s board ran the same tests as me in Sandra, running a single 295 with no dedicated PhysX processor.

http://www.evga.com/forums/tm.aspx?high=&a…p;mpage=1#58507

Thanks for the post freakysqeeky!

So for you…

OpenCL test: Both sides of you 295 were used with about 50% utilization.

CUDA test: 100% utilization on both sides of your 295.

Compute shadet test: 0% on one 1/2, and 100% utilization on your second.

And I was…

OpenCL test: 58% on 1/2 of my 295, and 19% on the other 1/2, with 20% load on the 280.

CUDA test: 1/2 of my 295 runs at 99%, and the other has intermittent processing issues causing GPU-z to not report it correctly, with varied light utilization on the 280.

Compute Shader test: 1/2 of my 295 runs at 99%, and the other at 0%, with mostly heavy usage on the 280.

Looks like to me…

With OpenCL: We both had one GPU at about 50%, and my other 50% is being split between the second 1/2 of my 295, and 280 dedicated PhysX processor (@ 19% and 20% utilization). (Your other 50% was all on the second 1/2 of your 295)

Odd…

With CUDA: We both had high utilization on both sides of our 295’s, but 1/2 of my 295 had intermittent processing issues causing GPU-z to not report it correctly, and some light work going to my 280. I wonder if the work that isn’t being done on the intermittent processing side of my 295, was being routed to my 280? That might be why I think the second 1/2 of my 295 isn’t processing 100% of the time in this test.

On the Compute Shader test: We matched up with our 295’s only using 1/2 of the GPU, but I still have some heavy processing going on with my 280.

Final conclusion: Driver issue?

Talonman, the quickest way to figure out what is going on would be to contact the GPU-Z guys and find out how they’re measuring GPU load and memory usage (e.g. what API are they using to get that information). They may be doing something strange/undocumented/unsupported which doesn’t play well with nVidia’s driver, or reports inaccurate information for either platform.

If Tim says that it’s not reliable info, I believe him; I’m just curious to know how they got the load monitoring to work (if it’s actually reporting accurate results).

I will do just that…

I will be a sport and give them a link too.

I will let you know if I get a response.

I found out the programmer for GPU-z is W1zzard on Extreme Systems.

I did PM him the question. :)

There is a GPU load counter that’s been in other tools (some NV system-wide monitoring tool had this same behavior as well); I think it’s just something exposed by NVAPI as generic GPU load. I don’t think there’s anything special about it, but there are reasons why I just haven’t used that counter and exposed it as a first-class citizen from CUDA or some other cross-platform library.

I can accept that my GPU’s utilization might not be too accurate reported by GPU-z, but the fact that my dedicated PhysX processor is reporting a load too, makes me think the driver is involved?

(Unless there is actually no load on my PhysX processor at all, and GPU-z is just plain old reporting bad info altogether.)

It did correctly report a 0% load running ATI-Tool, on my PhysX processor.

Two ways to measure load:

  1. Look at the temperature. When the GPU gets hot it is doing a lot of work, else not.
  2. Fire an empty kernel at some interval. If it does not return immediately, then the GPU is busy. Statistically twice as busy as the time taken to do nothing.

Good thought…

I can’t do the kernel thing, as I am not a CUDA programmer. :)

But as far as temperature goes…

This is what my water cooled 280 looks right before running the CUDA test @ 33C:

It did rise a few C when I started the Sandra CUDA test, up to 37C, but then back down to 36C:

The same thing with the Compute Shader test, but up to 38C:

What else I find interesting, is that GPU-z reports my 280 an NOT a valid CUDA, or Compute Shader calculating device.

The odd thing is, the CUDA test, and Compute Shader test are the ones that generate the most processing action on my 280! Go figure… :o

But it is a valid OpenCL, and PhysX device.

It makes me wonder why OpenCL would have access to my dedicated PhysX processor, but CUDA wouldn’t.