cudaprof questions

peastman · February 3, 2009, 12:59am

Hi,

I’m using CUDA 2.1 under Fedora 10 with a GTX295. I have some questions about cudaprof 1.1.

First, in the “Session Settings” window, the checkboxes for “gld uncoalesced” and “gst uncoalesced” are disabled, and I can’t find any way to make them be enabled. Is this a feature that hasn’t been implemented yet, or is there some way I can get it to count uncoalesced reads and writes?

Second, what is the meaning of the “cta launched” value? The manual gives the completely unhelpful definition, “Number of CTAs launched on the PM TPC.” Since the CUDA programming guide makes no mention of what a “CTA” or “TPC” is, I have no idea what this means.

Thanks!

Peter

MisterAnderson42 · February 3, 2009, 2:34am

Those counters are not active on G200 based GPUs. NVIDIA still hasn’t updated them to work. I would guess that the profiler has detected that you are on a G200 GPU and has deactivated them, though I’m not positive on that one.

This is just a different lingo that the graphics and really low level hardware guys use.

CTA = block

TPC = texture processing cluster (group of two or three multiprocessors)

PM = performance measured??? I don’t know on this one

Basically, ignoring the lingo only one multiprocessor has the performance counters. So, if you launch 1000 blocks and your calculation load is pretty even: cta_launched should average out around 1000/30. It really isn’t that useful of a counter unless you want to calculate a standard deviation or something and see how poorly your load is balanced among blocks.

peastman · February 3, 2009, 6:49pm

Thanks! I also have access to another computer with an 8800M running Windows XP, so I can do my profiling on that computer if necessary.

So I just installed the Cuda Profiler on that computer and tried it out. Sure enough, those checkboxes are enabled. But when I actually run my program, the only columns that appear in the output are Method, GPU Time, CPU Time, and Occupancy. That’s it. I’ve checked every check box there is to check in the Session Settings window, but none of the other values are actually getting reported. What am I missing?

Peter

MisterAnderson42 · February 3, 2009, 6:58pm

Odd. You should be at least getting something. Check the status window at the bottom. For any column that is reported as all 0’s, the profiler removes it from the main window display and prints something like "column ‘local load’ having all zero values is hidden.

Unless you kernel reads/writes nothing from/to global mem, there has to at least be something non-zero in some of the counters.

Open the session settings and double check that all the boxes are checked before clicking play. I just opened the profiler to test on my system and loaded a previous project. The session settings had gone back to the default for just GPU time CPU time and occupancy.

peastman · February 3, 2009, 8:44pm

No, it’s not reporting having hidden anything.

Ok, I did the following:

Open session settings and check all the checkboxes.
Click OK.
Bring up session settings again and verify that all the checkboxes are still checked.
Click Start. It runs and produces only the four columns I mentioned.
Bring up session settings yet again. None of the checkboxes are checked anymore.

Curiously, if I click OK and then start it from the main window, rather than clicking Start in Session Settings, the checkboxes do not get cleared. But it still only produces those columns.

Peter

MisterAnderson42 · February 3, 2009, 11:42pm

Well, I’m about out of ideas. The only thing I’ve got left is: your app isn’t muti-GPU perchance? The v1.0 profiler would get data parsing errors on applications that opened up more than one GPU context. Maybe they “fixed” the problem in 1.1 by turning off all the counters if this is detected.

peastman · February 4, 2009, 6:16pm

No, just one GPU in that machine and the app only uses one. And profiling the same application under Linux works fine (except that I don’t get counts for uncoalesced loads and stores).

Oh well, thanks for your help. It’s becoming clear to me that CUDA still isn’t as mature as one might like. :(

Peter

E.D_Riedijk · February 4, 2009, 6:46pm

Actually the profiler is one of those things that is working quite well.

Does your program run more than once or only one time?

When selecting all counters your program should run 3 times to profile all off them. It might be that you have your maximum runtime too low and the profiler quits after that maximum time. Then it will only have data from that first run.

peastman · February 5, 2009, 7:52pm

It runs three times. Each run takes only about five seconds, and all of them complete successfully. I have the maximum execution time set at 30 seconds.

It really seems like the other statistics just aren’t being generated. Maybe it’s just a coincidence, but it does seem striking that the only four columns that appear are the ones that you get in a profile log by setting CUDA_PROFILE=1. Are there different mechanisms used to generate different statistics? It seems like only one of those mechanisms is working.

Peter

E.D_Riedijk · February 5, 2009, 8:24pm

Well, here is someone else who has no clue anymore. Maybe you can pm mfatica or tmurray to see if they can help you find the reason.

Topic		Replies	Views
visual profiler with compute capability 1.0 cards? CUDA Programming and Performance	9	5210	September 12, 2008
Error in reading profiler output CUDA Programming and Performance	16	23374	September 27, 2010
cuda visual profiler selecting the counters CUDA Programming and Performance	3	2744	April 16, 2009
cuda profiler and gt280 ava. performance counters CUDA Programming and Performance	10	2910	May 15, 2009
Profiler - CPU Time CUDA Programming and Performance	8	6013	August 10, 2008
Visual Profiler not working (Win XP 64 bit) getting errors related to the profiler output CUDA Programming and Performance	21	37806	August 17, 2010
preview of NVIDIA Visual Profiler CUDA Programming and Performance	76	88949	May 18, 2010
well how do I know if cuda runs on the gpu CUDA Programming and Performance	20	13528	July 9, 2008
Silent kernel failure CUDA Programming and Performance	25	8326	May 18, 2020
Issues about CudaProfiler analysis Gpu Idle, missing kernel analysis topics CUDA Programming and Performance	2	7835	June 22, 2011

cudaprof questions

Related topics