[BUG] clBuildProgram taking exponential amount of time

heipei · February 21, 2011, 2:34pm

Hi forum,

I recently started porting a kernel I had written for CUDA to OpenCL. Since it does not use any advanced features, not that many changes were necessary, apart from keywords like global etc.

However, I discovered that clBuildProgram does take exponential amounts of time (c * 2^n) for every n BF_ENC operations I include. This only happens when I use declare the second argument __constant, using __global it compiles quickly and the resulting PTX looks just fine.

I’ve tried to reduce the problem as much as possible (so don’t try to read any functionality into it) with the kernel below. Replace every __constant by __global and watch what happens. Also try reducing the number of calls to BF_ENC until it compiles in about 10 seconds:

#define BF_ENC(LL,R,S,P) ( \

	LL^=(((	S[       ((uint)(R>>24)&0xff)] + \

		S[0x0100+((uint)(R>>16)&0xff)])^ \

		S[0x0200+((uint)(R>> 8)&0xff)])+ \

		S[0x0300+((uint)(R    )&0xff)]) \

	)

__kernel void BFencKernel(__global unsigned long *data, __constant unsigned int *bf_constant_schedule) {

	__private unsigned long block = data[get_global_id(0)];

	

	unsigned int l, r;

	l = data[get_global_id(0)];

	r = l;

	__constant unsigned int *p=bf_constant_schedule;

	__constant unsigned int *s=bf_constant_schedule+18;

	l^=p[0];

	BF_ENC(r,l,s,p[ 1]);

	BF_ENC(l,r,s,p[ 2]);

	BF_ENC(r,l,s,p[ 3]);

	BF_ENC(l,r,s,p[ 4]);

	BF_ENC(r,l,s,p[ 5]);

	BF_ENC(l,r,s,p[ 6]);

	BF_ENC(r,l,s,p[ 7]);

	BF_ENC(l,r,s,p[ 8]);

	BF_ENC(r,l,s,p[ 9]);

	BF_ENC(l,r,s,p[10]);

	BF_ENC(r,l,s,p[11]);

	BF_ENC(l,r,s,p[12]);

	BF_ENC(r,l,s,p[13]);

	BF_ENC(l,r,s,p[14]);

	BF_ENC(r,l,s,p[15]);

	BF_ENC(l,r,s,p[16]);

	r^=p[17];

	block = ((unsigned long)r) << 32 | l;

	data[get_global_id(0)] = block;

}

I’m using CUDA Toolkit 3.2, driver 270.18, Gentoo linux 2.6.37-x86_64 (vanilla). OpenCL is 1.0 obviously, the PTX generated is 2.2. GCC is version 4.5.1-r1. Right now I’m using “-w -cl-nv-verbose” as option to clBuildProgram, but that does not make a difference. The build is for my GeForce 8600 GT (CC 1.1).

I’ve read posts about some strange memory dependence using constant memory, but I can’t figure out anything wrong with my code, especially since it is very similar to my CUDA C routine.

heipei · February 21, 2011, 3:11pm

Small amendment: The kernel compiles fine (and returns the expected PTX with a lot of ld.const.u32) when I use the clcc tool available here: The Official NVIDIA Forums | NVIDIA

Topic		Replies	Views
clBuildProgram performance bottleneck clBuildProgram execute noticable slowly on GPUs implementation CUDA Programming and Performance	1	4120	October 5, 2009
clBuildProgram too slow! CUDA Programming and Performance	3	7853	November 18, 2010
Figuring compilation time :timing clBuildProgram CUDA Programming and Performance	5	1168	July 26, 2011
OpenCL vs Cuda performance on same kernels CUDA Programming and Performance	13	55802	July 15, 2010
Long OpenCL compile time when debugger is attached CUDA Programming and Performance	12	18802	November 20, 2010
clBuildProgram crashes on Ubuntu 10.04 LTS CUDA Programming and Performance	4	888	September 4, 2010
clBuildProgram always returns -11 CUDA Programming and Performance	13	6998	May 16, 2011
clBuildProgram very slow in Visual Studio debugger CUDA Programming and Performance	0	1000	September 5, 2010
Kernel build failure The kernel seems to fail when putting a for loop inside of an if. CUDA Programming and Performance	6	2121	February 13, 2010
clBuildProgram() hangs with 99% CPU-consumption CUDA Programming and Performance	3	7516	June 15, 2010

[BUG] clBuildProgram taking exponential amount of time

Related topics