CUDA compiler for Visual C++ not recognizing template constants ?

gmnenad · June 22, 2014, 4:56pm

I recently added BIGint arbitrary precision class to my gxLibrary ( https://sourceforge.net/projects/gxlibrary ) , which allows work with large integers on CUDA (and AMP and CPU too), like:

intB<128> A=7, B=99, C=A/B;
A>>=1;  B++;

Internally intB class has defined constant integer which represent how many 32bit unsigned ints are used. Something like:

static const int N= Nbits/32;

When I decided to optimize some operations for low Ns (for example N==4 for 128bit, or N==3 for 96 bit), I used something like this in code:

intB& operator++(){
	switch (N){
		case 4: if (!++d[0]) if (!++d[1]) if (!++d[2]) ++d[3]; break; 
		case 3: if (!++d[0]) if (!++d[1]) ++d[2]; break;
		case 2: if (!++d[0]) ++d[1]; break;
		case 1: ++d[0]; break;
		default:
			for (int i=4; i<N; i++){
				++d[i];
				if (d[i]) break;
			}
	}
	return *this;
}

Since ‘N’ used above is constant (templates are defined at compile time), compilers should remove any unneeded code paths already at compile time, so switch(N) actually become just part of code for given N - and it works exactly like that when compiled for CPU or AMP ( gxLibrary compile code for all three: CUDA/AMP/CPU )

But in case of CUDA compiler, it appears not to recognize that N is constant, since it is giving multiple “subscript out of range” warnings (code has d[3], even when N==2, but that part where d[3] is used should have been eliminated at compile time ).

While I could ignore warnings, my main question is if those are only warnings , or CUDA compiler also failed to remove unneeded code paths and left those “if (N==xyz)” comparisons or “switch(N)” code ? In which case it would also have slight performance impact.

Robert_Crovella · June 26, 2014, 2:03pm

are you compiling a debug or release project? Stated another way, are you compiling with -G switch?

You can figure out what code is actually present by inspecting the PTX or SASS code generated.

gmnenad · July 7, 2014, 9:33am

Compiler warnings appear both with debug compile or release compile.

It was not possible for me to easily check PTX or SASS, since Nsight is not able to map source code to PTX/SASS if global function header is part of #define macro (which I use in gxLibrary), or I would have done it sooner.

But I decided to make minimal test CUDA app that would have similar case, and check with that:

const int CONST=2;

__global__ void test_cu(int* gpuData){
	int res, d[2]={2,3};
	switch (CONST){
		case 1: res=d[0]; break;
		case 2: res=d[1]; break;
		default:
			res=d[2];
	}
	gpuData[ threadIdx.x]=res;
}

Above code should always resolve to gpuData[ threadIdx.x]=3

Good news is that, when compiled for Release, CUDA compiler does good job and optimize this. Entire SASS code looks like:

1	     MOV R1, c[0x0][0x44];	
2	     S2R R0, SR_TID.X;	
3	     MOV32I R2, 0x3;	
4	     ISCADD R0, R0, c[0x0][0x140], 0x2;	
5	     ST [R0], R2;	
6	     EXIT;

Bad news is that, even when compiling for Release where compiler ‘eventually’ remove things like ‘res=d[2]’, CUDA compiler still shows warnings (I guess it emits warnings before optimization stage).

Another bad news is that when compiled for Debug, there is no apparent optimization and complete switch(CONST) code remains - which indicate that CUDA compiler is not doing this optimizations for Debug builds (and also explain why debug builds are significantly slower). I wonder if there is CUDA option in VisualStudio that allow CUDA debug build with optimizations - but that is not related to my initial question.

Conclusion here is that NVidia CUDA compiler correctly optimize out switch() parts based on constant only in Release build, and it even then issue warnings for code paths that will never be reached (ie those it will optimize out), while in Debug builds it seems that Nvidia compiler do not optimize things.

Robert_Crovella · July 7, 2014, 1:11pm

The fact that the device compiler does not optimize when compiled with -G is a common observation. There are a variety of reasons for this. One is that when attempting to debug code, it’s convenient to have something that approximately represents the original source code, so as to facilitate things like source-level debugging (e.g. setting breakpoints at particular lines of source code). The fully optimized code can make wholesale changes to program organization at the assembly level such that it is nearly impossible to map it back to the original source code.

gmnenad · July 9, 2014, 2:31pm

True, but not directly an issue here - even if it would be nice if there is option to build CUDA debug mode with or without optimizations.

Issue here for me is that even if built for Release, CUDA compiler report warnings about lines of code that it will remove. But at least now I know those are false warnings.

Topic		Replies	Views
How does nvcc deal with constants? CUDA Programming and Performance	2	6429	June 15, 2009
How does nvcc deal with constants? CUDA Programming and Performance	2	2533	June 15, 2009
NVCC changes compile time constant when using template trait types CUDA Programming and Performance	0	5567	May 3, 2012
How do you debug compiler errors? CUDA Programming and Performance	5	11338	November 10, 2007
templates under XP/debug compilation error using templates/XP CUDA Programming and Performance	1	3999	May 7, 2008
What is code compiled with -arch=sm_13 slower? CUDA Programming and Performance	16	3680	April 22, 2009
compilation error with templated kernel CUDA Programming and Performance	2	2090	February 9, 2011
template error: expression must have a constant value CUDA Programming and Performance	9	7026	December 2, 2009
Warning: Controlling expression is constant CUDA Programming and Performance	0	2202	February 24, 2011
Turn of warnings NVCC During compile time CUDA Programming and Performance	4	7932	February 12, 2008

CUDA compiler for Visual C++ not recognizing template constants ?

Related topics