nvcc: compile-time kernel with dead code elimination

norman.pellet · March 6, 2020, 10:06am

Hello !

I have a question, which is related to how nvcc proceeds with dead code elimination. That might be more of a general compiler question than one that applies directly to nvcc. Here it is. Image the following kernel

template<typename T, bool compileTimeBlnTest>
void __global__ myKernel( ...some parameters ) {

	if (true == compileTimeBlnTest) { // Compile time ?
		// Do something
	}
	else {
		// Do something else
	}
}

My issue is that my kernel is called in such a way that “Do something” compiles with T as float but cannot compile with T as unsigned char (overloading missing, which is correct and by design)

However, in practice, the compilation unit always sees T as float when compileTimeBlnTest is true, and T as unsigned char otherwise. Yet it doesn’t compile.

My feeling is that the body of the condition is compiled nonetheless and that dead code occurs later on.
Is that correct ?

A more complete example to illustrate my point:

template<typename T, bool compileTimeBlnTest>
void __global__ myKernel( T* dataOut, T* dataIn ) {

	if (true == compileTimeBlnTest) { // Compile time ?
	    atomicAdd( dataOut, dataIn[0] )
	}
	else {
	    dataOut[0] += dataIn[0]
	}
}

with the following calls

unsigned long* dataOut, dataIn;
myKernel<unsigned long, true><<<...>>>( dataOut, dataIn )

// Or 

unsigned char* dataOut, dataIn;
myKernel<unsigned char, false><<<...>>>( dataOut, dataIn )