Problem of "Demoting to float"

This is one kernel function:

__global__ void get_lpfOff2(float *lpfSig3, float *carrFreq, float *remCarrPhase, float *lpfOff2, int N)


    int i = blockDim.x * blockIdx.x + threadIdx.x;

    if (i < N)


	int j = i%(9999*2);

	if (j >= 10000)


	    lpfOff2[i] = 0;




	    int k = i/(9999*2);

	    float m = carrFreq[k];

	    float n = remCarrPhase[k];

	    float alpha = -2*Pi*m*float(j)/5e6+n;

	    if (i % 2 == 0)

		lpfOff2[i] = lpfSig3[i]*cos(alpha)-lpfSig3[i+1]*sin(alpha);


		lpfOff2[i] = lpfSig3[i-1]*sin(alpha)+lpfSig3[i]*cos(alpha);




When I compiled it, I got this warning:

ptxas /tmp/tmpxft_00004262_00000000-5_gpsCodeProcess.compute_10.ptx, line 2095; warning : Double is not supported. Demoting to float

Can anyone help me figure out why?

I think because of this double to float issue, my data processing result is not correct. But I dont know how to fix it.


What GPU are you using? Double precision is only supported from compute capability 1.3 onwards. nvcc however defaults to compute capability 1.0 (which is what you are compiling for at the moment, as the file name indicates). If you have a compute capability 1.3 device, compile with [font=“Courier New”]nvcc -arch sm_13[/font] (or [font=“Courier New”]sm_20[/font] or [font=“Courier New”]sm_21[/font] for 2.0 resp. 2.1). If you want the code to also run on newer devices, use [font=“Courier New”]compute_13[/font] (or [font=“Courier New”]compute_20[/font] or [font=“Courier New”]compute_21[/font]) instead and the just-in-time compiler will be invoked each time the program is started to compile for the actual GPU present.

Looking at your code, you’re not actually using double precision anywhere, which makes the error a little odd. Most likely Pi is defined as a double precision constant, and thus gets cast to single precision to match the rest of the expression. Be careful that you use literal floats in your code - i.e. 2.0f rather than 2.0, since the compiler interprets 2.0 to be a literal double, which can lead to performance issues if the compiler decides to do some subexpression in double precision when you’re trying to use single precision, which it appears to be trying to do here.

I would suggest to follow Tera’s advice. If you actually intend to ccompile for sm_10, you would want to make the code “float clean”. To do so, convert all floating-point literals, such as “5e6”, to their single-precision equivalent by appending an ‘f’ (“5e6f” in this example). By default floating-point literals are of type double in C/C++. It might also make sense to explicitly invoke the single-precision variants of all math functions, i.e. use “sinf()” and “cosf()” instead of “sin()” and “cos()”.

Although it probably does not matter here (the code looks memory bound) as a general performance consideration it might make sense to use the function sincosf() which conveniently computes the sine and cosine of the input argument, faster than they can be computed separately:

float sin_alpha, cos_alpha;

sincosf (alpha, &sin_alpha, &cos_alpha);

if (i % 2 == 0) {

    lpfOff2[i] = lpfSig3[i]*cos_alpha-lpfSig3[i+1]*sin_alpha;

} else {

    lpfOff2[i] = lpfSig3[i-1]*sin_alpha+lpfSig3[i]*cos_alpha;