Am I right in that the visual profiler does not work with compute capability 1.0 cards? And is there a way to check that my reads/writes are coalesced on a 1.0 card? I tried setting CUDA_PROFILE=1 and checking the cubin file, but I don’t see anything in them about this. Thanks in advance.
maybe you should read the releasenotes? the result of a profile is not in a .cubin file…
I have used the non-visual and the visual debugger on 8800GTX (1.0 card) without any problems
I know the profile results are supposed to be in cuda_profile.log. I have looked at this log file, but there is only gputime, cputime and occupancy information in there. How can I check coaslescing? I assume you mean profiler, not debugger. The visual profiler won’t run my code. I’ve already read the release notes. Is there something I’m missing?
Please read the docs as was suggested, there is nothing we can tell you that is not already spelled out there. There is a file called CUDA_Profiler_2.0.txt in the doc/ directory isntalled by the toolkit. In it you will find
................
CUDA_PROFILE_CONFIG
is used to specify a config file for enabling performance counters
in the GPU. See the next section for configuration details.
Profiler Configuration
----------------------
This version of the Cuda profiler supports configuration options that allow
users to gather statistics about various events occurring in the GPU during execution.
These events are tracked with hardware counters on signals in the chip.
The followings options/signals are supported:
timestamp
---------
This option tells the profiler to log timestamps before kernel
launches and memory operations so the user can do timeline analysis.
gld_incoherent
gld_coherent
gst_incoherent
gst_coherent
--------------
These options tell the profiler to record information about whether global
memory loads/stores are coalesced (coherent) or non-coalesced (incoherent).
................
And much more. In short, you need to create a config file that lists the signals you want to record, set the proper environment variable and then read the numbers from cuda_profile.log.
You may also be interested in the visual profiler which automates this process for you with a GUI: http://forums.nvidia.com/index.php?showtopic=58283
aaaarrrgghhh, profiler yes, it’s just 90% of my brain cells are burning all their cycles hoping for the debugger ;)
As far as I remember, you have to add the signals you want to have measured to a config file, and set an environment variable to the location of that config file. It should be in a text file on your system.
The visual profiler runs all code that the normal profiler does as far as I know, so what is going wrong with the visual profiler? (the commandline profiler is much, much more cumbersome to work with)
When I try to run the visual profiler, I get "Error -91 in reading profiler output.
Empty data for ‘gputime’ column in profiler output file" even though there are times recorded in the cuda_profile.log.
Do I still need a config file with the visual profiler, or does the visual profiler set that up for me?
edit: looks like the temporary profiler config file and the csv file is being crated by the visual profiler. I set all the environment variables mentioned in the CUDA_Profiler_2.0.txt. Now I get an error
Error -88 in reading profiler output.
Empty data for 'cputime' column in profiler output file.
Perhaps caused by the memcopy lines in the csv file?
method,gputime,cputime,occupancy,gld_incoherent,gld_coherent,gst_incoherent,gst_coherent
memcopy,2493.728
memcopy,74.944
memcopy,18.752
edit2: Seems like the visual profiler is confused by the memcopy lines, and my log file is partially corrupted as some of the lines are incomplete. If I remove the memcopy lines and the broken lines fromthe csv file, I can import it into the visual profiler. How can I stop the profiler from outputting the memcopy lines?
well, it would be even easier, to just let the visual profiler do all the work. You will not have to set any environment variables, and not tweak any config files. Importing should indeed also work, but I have never tried, so I also don’t know if there is an option to not generate memcopy output. I think there is no such option (and find it a bug in the visual profiler if it cannot handle that)
Anyhow, if you use only the visual profiler, you should have no such problems, except for programs that change the current directory before running CUDA code. Then it would be smart to change back to the old directory before calling your kernel(s) as the visual profiler expects the files in that directory.
A lot of people on the forums see this error. I’ve never seen it myself, though. I think it can be caused if your program requires that you press a key to exit or if the working directory changes, as E.D. Riedijk pointed out.
The visual profiler will generate the config for you. You can select more than 4 signals and it will rerun your app several times to measure groups of 4 signals independantly. I find that this works best when the app always calls the same number of kernels in the same order so that the multiple runs can be merged line by line.
Weird. I’ve never seen this before either. To check, I just downloaded the latest profiler 1.0 (I haven’t profiled since I upgraded to CUDA 2.0) and ran it on my app which does lots of memcpys. I just setup the arguments, working directory, and then clicked go and everything worked with the memcpys.
I know it’s probably frustrating to hear me say “it worked for me”, but it does work for me with an extremely complicated app … Perhaps you could post a small test .cu file (to be compiled by nvcc -o test test.cu) that reproduces the problem. Then we could all try the same test case and narrow it down to the root cause.
Other ideas:
Maybe you have a mismatch between the CUDA profiler version and the version of CUDA? I don’t think the profiler format has changed since 1.1, but it may have. Try CUDA 2.0 and the latest profiler 1.0 download from http://www.nvidia.com/object/cuda_get.html .
Maybe it is a platform specific issue? I’m running on x86_64 linux. What platform are you on? I could switch to that platform and see if I get the same issue you do there.
Does your application run on more than one GPU? I just tried the profiler on that for another post on the forums and found that the visual profiler doesn’t like the profile output when I run on multi-GPU apps complaining about a missing gpu time column once and a missing occupancy column the 2nd time.
I am running on Fedora 8 x86_64 linux. I am using the ver 1.0.11 profiler, 2.0 beta2 CUDA sdk and 177.67 Nvidia driver. My app is a multi-GPU application, so I guess that is the problem. Thanks. :)