Is there any tool which can tell my kernel is compute bound or memory bound

sharath · April 2, 2010, 1:52pm

A tool which works for any cuda program and tells me is it memory bound or compute bound

Thanks for the help

laughingrice · April 2, 2010, 2:02pm

not as such as far as I know. You have the profiler that will tell you how effective your load/stores are (coalescing, cache misses etc.), and nexus which will tell you how hard the GPU is working.

You can also comment out compute stuff to see how well your communication is doing.

It’s usually trying to approximate how much communication is going on compared to bandwidth and how much math is going on compared to peak GFLOPS and see which one is higher

seibert · April 2, 2010, 2:23pm

If you can find a tool that lets you adjust the clock rates of your card, then I’ve seen people mention a neat trick to test this. Benchmark your code, then turn the core and shader clock down 15% and rerun the benchmark. Similarly, put the core and shader clock back to nominal and turn down the memory clock by 15%. By comparing the runtimes of the three tests, you should get a good sense of whether your code is compute or memory bound.

avidday · April 2, 2010, 2:26pm

The aggregate memory throughput counters in the visual profiler are pretty useful too (although the ones in the linux 3.0 release profiler are broken, at least for my compute 1.3 hardware). You can see the read and write throughput, and compare it to the specs for you card. It is usually possible to hit 90% of the theoretical bandwidth on all the hardware I have tried.

allanmac · April 2, 2010, 2:43pm

Great tip! This one goes into my list.

You could probably even automate this with a launcher program that uses NVAPI.

Something like:

[indent][font=“Courier New”]

nvscale --auto cuda-program

nvscale --memory -15 cuda-program

nvscale --core+shader -15 cuda-program

[/font][/indent]

Where [font=“Courier New”]–auto[/font] runs the program three times, scaling memory and cpu clocks just as you describe.

sharath · April 3, 2010, 5:34am

Good one, Thanks

sharath · April 3, 2010, 5:40am

This is much simpler method to test the program, thank you

sagrailo · April 3, 2010, 9:51am

Down-clocking sounds like an awesome idea indeed… I’m on Linux, and was not messing with under/over-clocking before. I found that one possible approach would be to enable corresponding controls in nvidia-settings application, by turning “Coolbits” option on in my xorg.conf file; so - is this right tool to use under Linux? Also: as I’m able to change core clock only this way - is “shader” clock changed with the same scale as core clock (for example, core clock on my Quadro FX 770m is 500MHz, while “shader” clock is 1.25 GHz, so if I decrease core clock by 15%, is “shader” clock going to be decreased by 15% too?)? Further: are there any changes needed under “PowerMizer” section - I have “adaptive” mode turned on there, and I noticed that as soon as I start any kind of CUDA application, clocks are going up to their maximum value, but would it be still needed, just to be sure, to change the preferred mode to “prefer maximum performance” while doing measurements?

Topic		Replies	Views
How to tell if a kernel is memory or compute bound CUDA Programming and Performance	8	9340	February 4, 2010
when did I reach max. possible speed? is there a way to know? CUDA Programming and Performance	6	2082	December 26, 2008
Profiling in a code line resolution CUDA Programming and Performance	7	7056	December 6, 2011
Best practices for cudaDeviceScheduleBlockingSync usage pattern on Linux CUDA Programming and Performance	5	3329	June 14, 2021
Profile cuda kernel CUDA Programming and Performance	7	513	January 19, 2023
Performance evaluation CUDA Programming and Performance	5	10071	August 20, 2008
well how do I know if cuda runs on the gpu CUDA Programming and Performance	20	13451	July 9, 2008
how to evaluate the CUDA's performance how can i know the program is optimazed CUDA Programming and Performance	7	7338	July 24, 2008
Profiling a computationally bound kernel CUDA Programming and Performance	1	2948	May 19, 2009
How can I tell if my memory accesses are being coalesced? CUDA Programming and Performance	5	1247	June 23, 2009

Is there any tool which can tell my kernel is compute bound or memory bound

Related topics