bug in loop?

zeyangl · May 19, 2011, 5:08am

24 float C[BDY] = {0};

25

…

35 #pragma unroll 1

36 for(int by=0; by<BDY; by++)

37 {

38 float b = inputB[…];

39

40 #pragma unroll 1

41 for(int row=0; row<BDY; ++row)

42 {

43 int col = by;

44 C[row] += shared[row*A_BLOCK_DIM + col] * b;

45 }

46 }

…

I’ve beem battling this for a couple days. I really can’t thik of a reason other than the opencl compiler is bugged.

So line 44 C[row] +…, if I compile this, the kernel returns almost immediately, producing neither correct results nor errors.

If I change it to C[by], the kernel returns fine, with correct timing and everything.

If I do

44 C[row] += shared[row*A_BLOCK_DIM + col] * b;

45 C[row] += 1.0;

46 C[row] -= 1.0;

the kernel produces correct result, but timing is off because I’m doing extra in my inner most loop.

Anyone seeing similar issue? I have the cuda sdk 4.0, same thing happened on cuda 3.2.

Thanks…

FlaviusV · May 19, 2011, 9:29pm

I cannot answer why this is broken, but you can try to save the compiled kernel into PTX assembly file and look there what is the code really doing.

Topic		Replies	Views
compiler bug? CUDA Programming and Performance	4	1764	January 13, 2009
An question about a cuda program CUDA Programming and Performance	2	1137	June 13, 2013
cudaFree() error + loop CUDA Programming and Performance	1	6682	April 1, 2010
Odd behavior. Bug in opencl implementation? CUDA Programming and Performance	5	6326	May 12, 2010
Odd problem with CUDA nested loop seems to not work CUDA Programming and Performance	3	11635	January 20, 2009
While loop in cuda kernel CUDA Programming and Performance	1	1384	April 16, 2019
Incorrect Result after large loop in kernel CUDA Programming and Performance	7	13739	November 2, 2010
When this CUDA Kernel is executed, it appears to crash at a specific point in the code... CUDA Programming and Performance	0	808	December 10, 2013
I can't understand kernel CUDA Programming and Performance	0	1980	March 10, 2007
Simple kernel problem A question about debugging a simple kernel CUDA Programming and Performance	2	2958	November 11, 2009