Cuda profiler options

Sorry if this has already been answered but I can’t find it off-hand in forum or readme files. I’m working on remote CUDA system where I can’t really run the visual profile but I need to get all those different perf counter values. I have set CUDA_PROFILE env. variable to be 1 and then I have a config file in the same directory as the program but it produces a profile output with only CPU and GPU time for all the functions. I have tried to move the config file to home directory but it seems it’s still not able to read it.

It would be great if someone point me to the format of the config file and where exactly should it be placed.


You also need to set CUDA_PROFILE_CONFIG to point to the config file you created.

It’s been a while since I setup a profile config by hand, but I’m pretty sure you just list the options you want on separate lines:






Note that you can only have 4 of the performance counters active at one time. (you can comment lines with # to make switching between various active sets easier within the same file)

This is all documented in /$cuda_install_location/doc/CUDA_Profiler_2.2.txt

Thanks a lot for the info. It seems my installation of Cuda is missing that readme file.

When I ran with the profiler, I got zero incoherent reads to the global memory which seems quite odd to me given that I haven’t done any memory access optimizations yet.

gld_coherent=[ 1063620837 ] gld_incoherent=[ 0 ] gst_coherent=[ 31400 ] gst_incoherent=[ 0 ]

Also the warp_serialize = 0 which means all threads in the warp are executed in parallel ?

Am I missing something here?

I guess you got luck then and all our memory access are coalesced :)

from the docs:
warp_serialize : Number of thread warps that serialize on address conflicts
to either shared or constant memory

Rather than continually quoting the documentation for every new question, I will just attach it here. Although, if your installation is missing the documentation then that is a major problem as you really can’t do anything in CUDA without reading it first!
CUDA_Profiler_2.2.txt (9.89 KB)

GT200 doesn’t report incoherent loads/stores–instead you’ll have to use the transaction counters.

Thanks Mister for the document and tmurray for the clarification. It’s really helpful.

Is there a way to determine if the kernel is idling waiting for some memory transactions to finish or is it compute bound using the profiling counters?