Problem with ^(1/4)

wanderine · March 30, 2011, 7:31pm

I want to calculate the fourth square root of all my elements, but I get very strange results.

T = sqrt(T);

works ok but

T = sqrt(T);
T = sqrt(T);

or

T = pow(T,0.25f);

does not work at all. Do I have to use double precision for these calculations or can I solve it in some other way?

avidday · March 30, 2011, 8:16pm

Try powf for single precision numbers.

SPWorley · March 30, 2011, 8:19pm

They should work fine.

But you’re also using double precision calls (but not variables) now. You may want to use sqrtf() and powf(). But you should still be getting the fourth root even with the double precision calls (which would be casted up to doubles then back down to floats).

So what’s your “strange result” that “does not work at all”? It’s unlikely that those two methods you list would fail so badly, especially if a single sqrt() is OK.

wanderine · March 31, 2011, 5:08am

No difference if I use sqrtf or powf instead. With a double sqrt all my values are at the end of the algorithm, after converting to uint16, either 0 or 65536.

avidday · March 31, 2011, 6:25am

So your only judge of whether these functions are working is by looking at some uint16 conversion or cast of the results? This

#include <stdio.h>

__global__ void kernel(void)

{

	float val = 1.2345e6f;

	float exponent = 0.25f;

	for(int i=0; i<4; i++) {

		val = powf(val,exponent);

		printf("%d: %f\n", i, val);

	}

}

int main(void)

{

	kernel<<<1,1>>>();

	cudaThreadExit();

}

does exactly what one would expect:

avidday@cuda:~$ nvcc -arch=sm_20 -Xptxas="-v" powf.cu 

ptxas info    : Compiling entry function '_Z6kernelv' for 'sm_20'

ptxas info    : Function properties for _Z6kernelv

    16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

ptxas info    : Used 17 registers, 32 bytes cmem[0], 4 bytes cmem[14], 36 bytes cmem[16]

avidday@cuda:~$ ./a.out 

0: 33.332874

1: 2.402803

2: 1.245029

3: 1.056319

Have you considered that something else might not be correct in the code, perhaps?

wanderine · March 31, 2011, 9:19am

Yes of course, but since it seems related to the double square root I wanted to check it first.

avidday · March 31, 2011, 9:40am

Introducing sqrt and pow, especially multiple repetitions of the double precision versions, will greatly increase the register count of the compiled kernel. Are you really sure that the kernel is actually running? It could be that if you haven’t adjusted your block size, that the kernel is failing to launch.

wanderine · April 1, 2011, 1:54pm

Problem solved, I made a mistake prior to the sqrt’s.

njuffa · April 1, 2011, 5:29pm

In terms of performance, you’d want to use nested calls to sqrt() instead of pow() to compute ^(1/4). pow() is quite expensive since it needs to handle many special cases and also needs additional arithmetic operations to ensure good accuracy across all combinations of arguments.

wanderine · April 3, 2011, 9:34am

The performance is not the major issue in this case, now I want to calculate ^(1/16) but 32 bit floats seem insufficient for this.

avidday · April 3, 2011, 9:48am

What do you mean by “insufficient”? Accuracy? Range? What magnitude numbers are you trying to compute with?

Topic		Replies	Views
Performance tweak for single-precision square root CUDA Programming and Performance	0	1066	March 25, 2021
sqrt function in CUDA kernel function call fails CUDA Programming and Performance	2	15411	November 5, 2007
__powf(): wrong behavior CUDA Programming and Performance	8	4866	October 23, 2009
A more accurate and faster implementation of powf() CUDA Programming and Performance	6	4653	February 13, 2025
Math Functions sqrt, pow ^ and log Legacy PGI Compilers	6	9332	June 8, 2013
Compiling double precission I can't use pow function... CUDA Programming and Performance	3	3579	June 14, 2009
Double precision square root function Legacy PGI Compilers	1	2336	December 8, 2010
pow function error with 1 float, 1 int input CUDA Programming and Performance	1	2997	June 26, 2007
Bug in the POW function? CUDA Programming and Performance	8	2827	December 5, 2021
Help understanding sqrt functions in CUDA CUDA Programming and Performance	2	5159	May 11, 2012

Problem with ^(1/4)

Related topics