OpenCL Visual Profiler Bug in profiling own code

Hi there,

yesterday I downloaded the OpenCL Visual Profiler to profil my own open cl kernel. The problem is that the profiler
always ends in an “Empty header line found in CSV files” or an “Error in reading the result” error message.

I’ve tried the profiler on the sdk examples and everything went just fine.
But after switching back to my kernel the problem remains the same.

The created csv files are empty and … yeah nothings happens.

My code simple copies a defined post of an image to another one
in another image… even If strip down the kernel to only do a simple log computation
it ends in the above error message.

Just to clarify once more:
I don’t use oclUitls or shrUtis… I do all the initialization of the code myself.
And the kernel works well under Linux and Snow Leopard


Duh - was going to ask exactly the same question… Same symptoms: I’m able to profile SDK samples, but when I run my program in the profiler, “Error in reading profiler output.” is reported, and the CSV files generated by profiler are empty. The program is about image processing, and is doing its job properly (the resulting image is always generated, which means the kernel is run for sure). I had profiling turned on in my command queue, as I was printing the kernel execution time, but I turned it off now, still the same error is reported. The only unusual thing about the program is that it is linked with Qt library, albeit with QtCore subsystem only, as I’m using it to embed the kernel source code within the program executable.


I’ve got the same problem, I tried to look in the demo makefile or in the source code, but I didn’t found anything special. So I’m also interested by the answer. And by the way, is there another way to profile OpenCL code ?


(I’m running under lunix ferdora 10, geForce 8600 GTS)

Struggled with the same problem. Thought I beat it, I get profiler output on a non-SDK project.
Project is a visual c++ routine loading a program with two kernels and executing them on, I hardly dare say, on a not supercool Geforce 8500 GT.
When I then use the same VC++ program to load another, but rather similar openCL program, it refuses to do the job, claiming error in reading profiler output, empty .csv’s as noticed by others as well.
There is no difference whether I read the .cl files or the .ptx files which I generated from the .cl files. In both cases one program works, the other does not. The output in the output window of the profiler is the same as when the program runs alone, i.e. it runs without errors of any kind. Further, the execution in the profiler takes about 10 seconds (both programs), much less than the 30 second exec time limit.
To make matters more interesting, the opencl program that profiles correctly is mine, or at least a very much reworked version of; the one that won’t profile is a slightly simplified but virtually original oclNbodyKernel. There’s only one commandqueue used, and only one program is loaded at a time.
I have no inkling where to look futrher >.< . I assume no profiling info is generated, so I am interested to know what factors, in addition to setting CL_QUEUE_PROFILING_ENABLE in the commandqueue, determine this, or any suggestions you guys might make.

AMD athlon 64 X2 4200+, win7 64 build 7201, driver version 191.07, VS2008, visual profiler 1.02. The VC++ program does not use oclutil etc.

So - can you post both “working” and “non-working” versions of your code, or at least the diff, so that maybe someone could try on different hardware and/or try to get some further clue on this issue?


Thanks for the interest.

Try an svn from

Is VC++ including projectfile and oclpj file.

Hope you 're not allergic to MS…


PS I owe you guys some explanation:

top line of the common.h file defines RUNRINGTEST. That will load and profiling should be ok.

Comment this line out and will be loaded instead, and profiling - at least in my case/setup - fails.

Question is of course, why?

Difference between the two programs is that they are called by different, but very similar functions in host-program.cpp (top of the file), ringtest has two kernels rather than 1.

In this setup the profiler-action is the same in both cases, just add a session with the recompiled executable.

The projectfile uses an environment variable OPENCL pointing at the common directory, e.g.

OPENCL=C:\Users\Jan\AppData\Local\NVIDIA Corporation\NVIDIA GPU Computing SDK\OpenCL\common

Hope this will allow you to recreate my problem.

Have added a little extra, i.e. measuring GPU time through clGetEventProfilingInfo(). This shows that profiling is available.

Also, I noticed that specifying an event in the command to be profiled (such as clEnqueueNDRangeKernel(), last param <> NULL) will cause the profiler to fail. See code, use GET_GPU_TIME in common.h. To get any result from the profiler, comment this line out.


Thanks for posting link to your code (and sorry for not replying before, but that was because the forum was down). I’m not allergic to MS, but unfortunately I do not have a Windows installation available. So I tried to make changes in your code to have it build under Linux, but it is just too much of the work… I’ve installed today the 3.0 SDK beta, in the hope that this issue with the profiler may be fixed, but unfortunately it is still there - the profiler is handling all of SDK samples I tried without any problem, but it keeps reporting “Error in reading profiler output” on my code. I tried to check what’s going on with strace, but early on the first run of any kind of program (my program, or any of SDK samples I tried), strace is segfaulting. I tried with number of changes in my code, including removing setting event parameters to clEnqueueNDRangeKernel(), but again to no avail. So - anyone (especially NVIDIA guys): is there any kind of suggestion on how to further debug why it may happen that OpenCL profiler is not writing anything to corresponding CSV files during the profiled program execution?

I have a similar problem. The latest Visual Profiler running on 32-bit Vista executes the program and displays “Error in reading profiler output”. If i select all settings it generates four CSV files in the working directory similar to this:




TIMESTAMPFACTOR 1179739c72b57725


The profiled application is running successfully to completion.

It looks like a wonderful tool with great potential, but it is broke. Is there an application log somewhere which indicates why Visual Profiler is generating empty log files?

I have a GTX 275 running BIOS with driver

It would be nice if NVidia could simply release the profiler source as an example.

I’ve had the same problem, but solved it by freeing / releasing all used OpenCL resources (memory, program, kernels) after that it worked :)

I love you,
Yes, It works for me.

thank you very much,


ps: ok we know now who are not releasing ressources at the end of their code ;-)

If the source was available or the tool logged the reasons for its inability to provide data then we would know. It may well be that it also functions as a resource leak detector :>). It’s free and quite useful when it works so i can’t complain too much… but source would be nice.

I also encountered the same problem
“Empty header line found in CSV files”

However, if I ran the profiler on SDK examples, everything worked fine. I am new to OpenCL and working my way through examples.

I modified the oclVectorAdd example to measure the times of my execution, which meant that I had to add an instance of clEvent to pass to my clEnqueueNDRangeKernel call. However, I was naughty in not explicitly calling the clReleaseEvent(eventname), which caused my profiler to output the error above. Similar to what others have found in this thread.

Just in case it may help others, if you do create an instance of any clInstances, make sure you call release on them for your profiler to work. I know the gurus will say, ofcourse you should call release/free at the end of your programs, but I know I am not always that good and careful …

Hope this helps and thanks to those who posted there experiences, which helped me to find the bug in my code


I am on a 32bit machine, ubuntiu9.10. driver 190.29 and openCL/profiler V1.0

OK, another sloppy loser to make a confession here: I was struggling (as it could be verified from the date of my previous message on this thread) for three and half months with this issue, and today when I noticed clReleaseEvent() mentioned - I tried to add it, and profiler works fine with my program now. @sv650: thanks so much for mentioning it!

aha! that got me too. after calling clReleaseEvent, the profiler finally works.

(despite that, I would still call this a bug of the profiler, it should not be so sensitive to memory leaks. meanwhile, I would suggest putting this to the FAQ)

In addition, I found that if you call clEnqueueWriteBuffer WITHOUT passing an event (i.e. passing in NULL) the profiler will also refuse to work. The work around is to declare a throwaway event and release it right after you write to the buffer (I reused the same cl_event object then released it after I finished all my write calls).

I would assume other functions that accept events as their arguements might cause the same problem, but I ran into this one specifically.

Hope it helps someone else out there!


Sir, you are a lifesaver…I did have to pass an event and then release it for enqueuewrite (for Win 7)