compiler bug?

Christoph_John · December 17, 2008, 6:12pm

Hello,

I have a problem compiling some code, which looks like a cuda bug for me. I hope this is the right place for that.

[codebox]

#pragma unroll

for(int j=0; j<(BLOCK_DIM_X<<5); ++j)

fSum += pfSMTmp[threadIdx.x + (j<<5)];

[/codebox]

This gives:

nvopencc ERROR: C:\CUDA\bin/…/open64/lib//be.exe returned non-zero status -1073741819

1>nvopencc INTERNAL ERROR: cannot unlink temp file C:/DOKUME~1/cjohn/LOKALE~1/Temp/ccBI#.a03880

Looks like it is related to pragma unroll, and the shift operation in (BLOCK_DIM_X<<5). Without #pragma unroll everything is fine and with

(BLOCK_DIM_X/32) the code works fine as well.

Cheers

Christoph

Christoph_John · December 17, 2008, 6:15pm

Hello,

I have a problem compiling some code, which looks like a cuda bug for me. I hope this is the right place for that.

[codebox]

#pragma unroll

for(int j=0; j<(BLOCK_DIM_X<<5); ++j)
fSum += pfSMTmp[threadIdx.x + (j<<5)];
[/codebox]

This gives:

nvopencc ERROR: C:\CUDA\bin/…/open64/lib//be.exe returned non-zero status -1073741819

1>nvopencc INTERNAL ERROR: cannot unlink temp file C:/DOKUME~1/cjohn/LOKALE~1/Temp/ccBI#.a03880

Looks like it is related to pragma unroll, and the shift operation in (BLOCK_DIM_X<<5). Without #pragma unroll everything is fine and with

(BLOCK_DIM_X/32) the code works fine as well.

Cheers

Christoph

Sorry this was my fault, the shift should of course be that way (>>), than it works like it should.

_Big_Mac · January 13, 2009, 3:22pm

I’ve stumbled onto something similar

__device__ int count(int a, int b) {

	int c = a*b;

	

	return c;

}

__global__ void testKernel1(float* a, float *b, float *c)

{

	int tid = threadIdx.x + blockIdx.x*blockDim.x;

	int ia = a[tid];

	int ib = b[tid];

	int ic = c[tid];

	

	//#pragma unroll 

	for(int i=0; i<2048; i++) {

		ic=count(ia,ib);

		ia=count(ib,ic);

		ib=count(ia,ic);

	}

	c[tid] = ic;

	a[tid] = ia;

	b[tid] = ib;

	

}

When I uncomment the pragma, I get those nvopencc errors (same as yours, can’t unlink temp file…).

Any ideas?

I’m using CUDA 2.0 on Windows XP 32 and VS 2005.

Christoph_John · January 13, 2009, 4:18pm

Hello,
the problem is that the preprocessor cannot unroll all your loops which is 2048 times in your code. The compiler just crashes instead of reporting an error when unrolling. Try unroling a smaller amount of loops. Like with

#pragma unroll 10
This should unroll only the first 10 loops.

You will have to find the hard limit of loops you can unroll yourself. As far as I know the max possible count depends on the register usage and code size of your kernel. Therfore no general hard limit here.

_Big_Mac · January 13, 2009, 4:40pm

Thanks, you were right.

Unrolling up to about 45 benefits runtime while trying to unroll further causes the performance to actually gradually drop even though inspecting the .cubin file reveals no additional registers used (7 reg, 0 lmem). I presume I’m hitting the L1 cache limit? The binary data in .cubin comprises of about 500 lines, each having 4 32-bit instructions/operands, totalling slightly less than 8KB of data.

Topic		Replies	Views
Problems about #pragma unroll and auto optimization CUDA Programming and Performance	2	2836	March 10, 2009
Problem with unrolling loops CUDA Programming and Performance	9	8595	November 24, 2011
Problems about #pragma unroll and auto optimization CUDA Programming and Performance	0	3453	March 8, 2009
BUG? nvcc fails to unroll the loop CUDA Programming and Performance	6	6020	May 26, 2009
Extension cl_nv_pragma_unroll doesn't seem to work CUDA Programming and Performance	4	20140	October 12, 2011
loop unrolling CUDA Programming and Performance	11	17034	January 31, 2008
#pragma unroll not working? CUDA Programming and Performance	3	4898	June 8, 2009
Loop unroll & remainder perf CUDA Programming and Performance cuda , performance , nvcc	6	3106	April 12, 2022
forcing loop unrolls CUDA Programming and Performance	4	665	October 11, 2018
Cuda compiler loop unroll bug? CUDA Programming and Performance	14	2459	October 25, 2017

compiler bug?

Related topics