how to call math.h cos from cu file need math.h cos in cu file not in kernel

bin04017 · December 14, 2007, 10:13am

Hi,
how could i use cos / sin from std math.h in cu files?
cheers

kuisma · December 14, 2007, 7:28pm

You again? :)

You can’t. cos() is a double precision function, CUDA only support float precision, i.e. the cosf() function.

And it is really not “from std math.h”. The header files are just declarations. The actual function is defined in the math library, and there is one math library for your CPU and another for your GPU, and they are not interchangeable. You can not run your gasoline engine on diesel.

Use float precision if you are going to use CUDA, If it is not good enough, do not use CUDA.

mfatica · December 14, 2007, 7:42pm

You can have a .cu file that contains host code.
Host code is handled to the host compiler from nvcc and it will use the host math.h.
If you have a kernel or device function, it will use the CUDA math.h

mykii · December 16, 2007, 12:43pm

it is not possible simply because opcodes on the GPU are differents of your processor. So every libraries of your system will not be compatible with the GPU.

-mv

Jedimaster · December 17, 2007, 5:55am

maybe you can manually write PTX assemebly code, there are SIN & COS opcode

kuisma · December 17, 2007, 8:45am

No. The problem is that you do not have any native double precision floating point data type in CUDA/PTX and the performance hit emulating this in software (assembler, C or whatever) is too high to be interesting.

– Kuisma

bin04017 · December 17, 2007, 9:11am

i know that is useless, but i do ask for how i could use it and not that i need the precision… just to find out how to tell cuda to take std c functions instead of cuda lib functions…

kuisma · December 17, 2007, 9:17am

I think you are looking for a solution to your problem in the wrong direction.

What is your problem you are trying to solve?

DenisR · December 17, 2007, 12:53pm

The part that runs on the host CPU is using the std c functions. The part that is running on the GPU is using the CUDA lib functions. You cannot change that, since the standard c library does not support the GPU.

spiker · December 17, 2007, 4:38pm

you can write a program of your own in CUDA:

if x is the angle then

sin(x)=x-(x^3)/(3!)+(x^5)/(5!)-(x^7)/(7!)+(x^9)/(9!)+…

cos(x)=1-(x^2)/(2!)+(x^4)/(4!)-(x^6)/(6!)+…

this is really very parallelizable: take in account the error you commit while truncating the series.

Please refer to http://www.efunda.com/math/taylor_series/trig.cfm for the description of other trigonometric / exponential functions; hope it could be useful

mfatica · December 18, 2007, 2:51am

Using Taylor or Maclaurin series is not the right approach.
You want to use minimax polynomials and the Remez algorithm to find their coefficients.

sachin · April 14, 2009, 5:02am

hey,

i know this was posted yonks ago but id thought id post anyway. It took me abt 2 days of work until I realised that there was a double precision version of cos sin tan etc written ALREADY!!! im guessing it wasnt written at the time this message was posted.

Anyways since I spent so much time on it id thought id share my code for the cos function it is parallelised in almost every possible way i can think of.

[codebox]#include <cuda.h>

#include <cuda_runtime.h>

#include “liarp.h”

#include “arith_const_types.h”

#define DOUBLE_EPSILON 4.22045e-016

#define PI ((double)3.1415926535897932384626433832795028841971693993751

#define TAYLOR_TERMS 10 /excluding constant/

//factorials

#define TWO 2

#define FOUR 24

#define SIX 720

#define EIGHT 40320

#define TEN 3628800

#define TWELVE 479001600

#define FOURTEEN 87178291200

#define SIXTEEN 20922789888000

#define EIGHTEEN 6402373705728000

#define TWENTY 2432902008176640000

//takes the cos of the vector in and returns it in vector out

global void cos (void *in, void *out, int code_in, long elements){

DBL_TYPE	*di, *dbo, x;

__shared__ DBL_TYPE	di_x[blockDim.y][blockDim.x]; 

long long int factorial[]={TWO, FOUR, SIX, EIGHT, TEN, TWELVE, FOURTEEN, SIXTEEN, EIGHTEEN, TWENTY};

int i = blockIdx.x * blockDim.x + threadIdx.x;

if (i<elements ){

	dbo = (DBL_TYPE *)out;

	if (code_in == DBL_CODE) {

		di = (DBL_TYPE *)in;

		x=di[i];

	}

	/*cos function is periodic repeats itself every 2*PI*/

	x=remainder(x,2*PI);

	/*cos(-x)=cos(x)*/

	x=abs(x);

	/*this section exploits the fact6 that the cos function look SIMILAR every PI/2*/

	if(x>=0 && x<PI/2){/*first quadrant*/

		/*x remains the same*/

		power(x,di_x, i);

		cos_taylor(di_x, i);

		dbo[i] =di_x[0][i];

	}

	else if (x>=PI/2 && x<PI){/*second quadrant*/

		x=PI-x;

		power(x,di_x, i);

		cos_taylor(di_x, i);

		dbo[i] =-di_x[0][i];

	}

	else if (x>=PI && x<3*PI/2){/*third quadrant*/

		x=x-PI;

		power(x,di_x, i);

		cos_taylor(di_x, i);

		dbo[i] =-di_x[0][i];

	}

	else if (x>=3*PI/2 && x<=2*PI){/*fourth quadrant*/

		x=2*PI-x;

		power(x,di_x, i);

		cos_taylor(di_x, i);

		dbo[i] =di_x[0][i];

	}			

}

}

//returns (by reference) the cosine of x[0] (in radians)

//answer is saved in first row of 2D array, x

device void cos_taylor (double x[blockDim.x], long long int factorial, int thread_x){

int i = blockIdx.x * blockDim.x + threadIdx.x;

int j = blockIdx.y * blockDim.y + threadIdx.y;

double sign=-1, cosine=1;

if (i==thread_x){/*should be equal but just a safeguard*/

	x[j][i]=x[j][i]/factorial[j];

	/*this section could be parallelised but i cant be bothered*/

	for (k=0;k<TAYLOR_TERMS;k++){

		cosine+=sign*x[k][i];/*add up vector but also adjust sign*/

		sign=sign*-1.0;/*alternate sign*/

	}

	if (j==0)/*ensure one thread is used only*/

		x[0][i]=cosine;/*otherwise will create memory access errors*/

}

}

//returns the absolute value of x

device double abs_CUDA (double x){

if (x<0)

	return -x;

else return x;

}

//returns remainder after dividend/divisor

device double remainder (double dividend, double divisor){

int quotient=0;

double r=0;

quotient=(int)(dividend/divisor);

r=(dividend-(double)quotient*divisor);

return r;

}

//raises x to the power according to TAYLOR ARRAY

//takes use of the associativity in multiplication so could be done in less iterations

device power (double x, double taylor_array[blockDim.x], int thread_x){

int i = blockIdx.x * blockDim.x + threadIdx.x;

int j = blockIdx.y * blockDim.y + threadIdx.y;

int gap=1;/*initialise gap to one*/

if (i==thread_x){/*should be equal but just a safeguard*/

	taylor_array[j][i]=x;/*fill up array with value x*/

	while(gap<TAYLOR_TERMS){/*this iteration can only be done while gap is less than size of array*/

		if(j>=gap){/*can only multiply elements after the 'gap'th element*/

			taylor_array[j][i]*=taylor_array[j-gap][i];/*multiply itself from value in array 'gap' elements behind*/ 

			gap*=2;/*double gap*/

		}/*end if*/

	}/*end while*/

}

}/end function power/

[/codebox]

oh and dont bother pointing out the fact that no one is actually gonna use this code. i just had to upload so i could feel asif my time hasnt gone to a total waste!!! and i only used taylor (maclaurin) series, not the other one (minimax, remez…)

Topic		Replies	Views
Does PTX support double sin() and cos()? CUDA Programming and Performance	4	1533	November 17, 2014
Complex arithmetics in CUDA CUDA Programming and Performance	9	9251	July 9, 2013
Why when I tried to use "cosf" function in CUDA, there ocurred errors? CUDA Programming and Performance cuda	11	495	August 28, 2023
performance difference for cuda between experiments and the documentation for float/double data type... CUDA Programming and Performance	8	1901	October 28, 2016
Accuracy in GPU floating point calculations CUDA Programming and Performance	35	8230	September 9, 2011
Use of math.h calls inside an OpenACC region, C++ Legacy PGI Compilers	5	19633	April 12, 2014
Fastmath functions Speed or accuracy CUDA Programming and Performance	8	21481	April 16, 2009
Optimizing performance of a serial <<<1, 1>>> kernel, after long debugging hours CUDA Programming and Performance	13	887	July 2, 2018
CUDA: How to run cuda kernel functions not in SDK? Compilation issues CUDA Programming and Performance	7	6658	December 2, 2009
trigonometric functions standard c v/s cuda CUDA Programming and Performance	13	5581	October 25, 2015

how to call math.h cos from cu file need math.h cos in cu file not in kernel

Related topics