Figuring compilation time :timing clBuildProgram

gudge · May 31, 2011, 10:15pm

Hi,
I was trying to figure the time taken to compile OpenCL code through NVIDIA’s OpenCL compiler on windows. Does the clBuildProgram handle he job of (compiling & linking) or it offloads some of it’s functionalities to clEnqueueNDRangeKernel, clCreateKernel or any other API.

Thanks
Gudge

Martin_Nilsson · June 1, 2011, 6:32am

Not sure about this, and could not find any mention of it in the spec, but I vaguely remember from somewhere that clBuildProgram may use background threads to do the actual compiling and linking. If that is the case, then any call to the cl*Kernel functions will probably have to wait for the compilation to finish. But as I said, not sure where I got this from.

eyebex · June 1, 2011, 8:29am

I’ve timed compilation in the past by only timing the call to clBuildProgram(), and it seemed to work fine. However, if you’re calling clBuildProgram() multiple times in a row, subsequent calls may require less time as the compiler itself does not need to be loaded to memory anymore. You may be able to unload the compiler in between for consistent numbers by calling clUnloadCompiler(), but that’s only a hint to the implementation. So the better way would be a dummy call to clBuildProgram() to make sure the compiler is already loaded for all compilations, and then do these compilations quickly after each other (the compiler might unload after a certain amount of time clBuildProgram() has not been called).

Also, to be on the safe side, you could try to force-trigger the compilation by requesting the binaries via clGetProgramInfo() / CL_PROGRAM_BINARIES as part of the timing …

philipjfry · June 1, 2011, 8:50am

I suspect determining these numbers is not too simple, especially if you do not have information on how NVIDIA’s runtime works internally.

I know for sure only about Apple’s and AMD’s platforms that they both cache intermediate code (the Apple platform is even caching too aggressively, not regenerating PTX for changed preprocessor macros and so forth). For NVIDIA I can only suspect it, but caching of the GPU instruction stream generated from PTX might even be out of OpenCL’s influence and handled in the CUDA backend.

If anyone is willing to measure and check all that I’d be curious myself, but am not be willing to spent the time myself - especially as it may change with every new OpenCL and CUDA driver release.

Regards,
Markus

Irina · July 24, 2011, 8:09pm

So, do I understand you correctly? : If I have a file with several kernels ,it ought generate for each kernel its own file.And not compile the program with several kernels ,but compile

sequently one program after another (for each kernel).This should minimize the compilation time?

(And what did you meen by “multiple times in a row”?)

Another question regarding :

How have you timed the clBuildProgram()?Did you use some openCL/Profiling function/tool or just clock()?

Thank you

eyebex · July 26, 2011, 7:36am

Well, what I was seeing was this: If I compile OpenCL source code, and immediately afterwards compile the same source code again, the second compilation takes less time than the first one. This is probably due to two things, 1) the first call to the OpenCL compiler loads the compiler itself to memory, which doesn’t need to be done for the second call anymore, 2) the driver might cache compilation results and not compile anything actually if the compilation result of some source code is already known.

If the second source code is different than the first, you’ll still benefit from the compiler being already loaded to memory. However, the time you save should be negligible compared to the compilation time, so I would not recommend to artificially split existing source code into multiple files, as this will make you using more API calls to load the source, again requiring more time.

I’ve using my own timing class on Windows based on QueryPerformanceCounter().

Topic		Replies	Views
Long OpenCL compile time when debugger is attached CUDA Programming and Performance	12	18737	November 20, 2010
clcc - an NVIDIA OpenCL command line compiler CUDA Programming and Performance	8	10128	November 1, 2012
OpenCL Offline Compilation CUDA Programming and Performance	7	6216	July 11, 2010
clBuildProgram too slow! CUDA Programming and Performance	3	7822	November 18, 2010
Wish List for next OpenCL release CUDA Programming and Performance	9	17441	September 9, 2009
How to get the kernel binary file from OpenCL Nvidia GPU toolkit CUDA Programming and Performance	3	4523	November 24, 2011
NVIDIA downclocks my card when running OpenCL CUDA Programming and Performance	17	2939	May 27, 2025
Is task parallel programming (aka "concurrent kernels") in OpenCL supported? CUDA Programming and Performance	1	4290	August 31, 2011
clBuildProgram returns CL_INVALID_BINARY for double data types on a GTX 480 CUDA Programming and Performance	4	1892	November 13, 2013
OpenCL clBuildProgram caches source, and does not recompile if #include'd source changes CUDA Programming and Performance	2	1129	August 27, 2015

Figuring compilation time :timing clBuildProgram

Related topics