Hi,
how could i use cos / sin from std math.h in cu files?
cheers
You again? :)
You can’t. cos() is a double precision function, CUDA only support float precision, i.e. the cosf() function.
And it is really not “from std math.h”. The header files are just declarations. The actual function is defined in the math library, and there is one math library for your CPU and another for your GPU, and they are not interchangeable. You can not run your gasoline engine on diesel.
Use float precision if you are going to use CUDA, If it is not good enough, do not use CUDA.
You can have a .cu file that contains host code.
Host code is handled to the host compiler from nvcc and it will use the host math.h.
If you have a kernel or device function, it will use the CUDA math.h
it is not possible simply because opcodes on the GPU are differents of your processor. So every libraries of your system will not be compatible with the GPU.
-mv
maybe you can manually write PTX assemebly code, there are SIN & COS opcode
No. The problem is that you do not have any native double precision floating point data type in CUDA/PTX and the performance hit emulating this in software (assembler, C or whatever) is too high to be interesting.
– Kuisma
i know that is useless, but i do ask for how i could use it and not that i need the precision… just to find out how to tell cuda to take std c functions instead of cuda lib functions…
I think you are looking for a solution to your problem in the wrong direction.
What is your problem you are trying to solve?
The part that runs on the host CPU is using the std c functions. The part that is running on the GPU is using the CUDA lib functions. You cannot change that, since the standard c library does not support the GPU.
you can write a program of your own in CUDA:
if x is the angle then
sin(x)=x-(x^3)/(3!)+(x^5)/(5!)-(x^7)/(7!)+(x^9)/(9!)+…
cos(x)=1-(x^2)/(2!)+(x^4)/(4!)-(x^6)/(6!)+…
this is really very parallelizable: take in account the error you commit while truncating the series.
Please refer to http://www.efunda.com/math/taylor_series/trig.cfm for the description of other trigonometric / exponential functions; hope it could be useful
Using Taylor or Maclaurin series is not the right approach.
You want to use minimax polynomials and the Remez algorithm to find their coefficients.
hey,
i know this was posted yonks ago but id thought id post anyway. It took me abt 2 days of work until I realised that there was a double precision version of cos sin tan etc written ALREADY!!! im guessing it wasnt written at the time this message was posted.
Anyways since I spent so much time on it id thought id share my code for the cos function it is parallelised in almost every possible way i can think of.
[codebox]#include <cuda.h>
#include <cuda_runtime.h>
#include “liarp.h”
#include “arith_const_types.h”
#define DOUBLE_EPSILON 4.22045e-016
#define PI ((double)3.1415926535897932384626433832795028841971693993751
#define TAYLOR_TERMS 10 /excluding constant/
//factorials
#define TWO 2
#define FOUR 24
#define SIX 720
#define EIGHT 40320
#define TEN 3628800
#define TWELVE 479001600
#define FOURTEEN 87178291200
#define SIXTEEN 20922789888000
#define EIGHTEEN 6402373705728000
#define TWENTY 2432902008176640000
//takes the cos of the vector in and returns it in vector out
global void cos (void *in, void *out, int code_in, long elements){
DBL_TYPE *di, *dbo, x;
__shared__ DBL_TYPE di_x[blockDim.y][blockDim.x];
long long int factorial[]={TWO, FOUR, SIX, EIGHT, TEN, TWELVE, FOURTEEN, SIXTEEN, EIGHTEEN, TWENTY};
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i<elements ){
dbo = (DBL_TYPE *)out;
if (code_in == DBL_CODE) {
di = (DBL_TYPE *)in;
x=di[i];
}
/*cos function is periodic repeats itself every 2*PI*/
x=remainder(x,2*PI);
/*cos(-x)=cos(x)*/
x=abs(x);
/*this section exploits the fact6 that the cos function look SIMILAR every PI/2*/
if(x>=0 && x<PI/2){/*first quadrant*/
/*x remains the same*/
power(x,di_x, i);
cos_taylor(di_x, i);
dbo[i] =di_x[0][i];
}
else if (x>=PI/2 && x<PI){/*second quadrant*/
x=PI-x;
power(x,di_x, i);
cos_taylor(di_x, i);
dbo[i] =-di_x[0][i];
}
else if (x>=PI && x<3*PI/2){/*third quadrant*/
x=x-PI;
power(x,di_x, i);
cos_taylor(di_x, i);
dbo[i] =-di_x[0][i];
}
else if (x>=3*PI/2 && x<=2*PI){/*fourth quadrant*/
x=2*PI-x;
power(x,di_x, i);
cos_taylor(di_x, i);
dbo[i] =di_x[0][i];
}
}
}
//returns (by reference) the cosine of x[0] (in radians)
//answer is saved in first row of 2D array, x
device void cos_taylor (double x[blockDim.x], long long int factorial, int thread_x){
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
double sign=-1, cosine=1;
if (i==thread_x){/*should be equal but just a safeguard*/
x[j][i]=x[j][i]/factorial[j];
/*this section could be parallelised but i cant be bothered*/
for (k=0;k<TAYLOR_TERMS;k++){
cosine+=sign*x[k][i];/*add up vector but also adjust sign*/
sign=sign*-1.0;/*alternate sign*/
}
if (j==0)/*ensure one thread is used only*/
x[0][i]=cosine;/*otherwise will create memory access errors*/
}
}
//returns the absolute value of x
device double abs_CUDA (double x){
if (x<0)
return -x;
else return x;
}
//returns remainder after dividend/divisor
device double remainder (double dividend, double divisor){
int quotient=0;
double r=0;
quotient=(int)(dividend/divisor);
r=(dividend-(double)quotient*divisor);
return r;
}
//raises x to the power according to TAYLOR ARRAY
//takes use of the associativity in multiplication so could be done in less iterations
device power (double x, double taylor_array[blockDim.x], int thread_x){
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
int gap=1;/*initialise gap to one*/
if (i==thread_x){/*should be equal but just a safeguard*/
taylor_array[j][i]=x;/*fill up array with value x*/
while(gap<TAYLOR_TERMS){/*this iteration can only be done while gap is less than size of array*/
if(j>=gap){/*can only multiply elements after the 'gap'th element*/
taylor_array[j][i]*=taylor_array[j-gap][i];/*multiply itself from value in array 'gap' elements behind*/
gap*=2;/*double gap*/
}/*end if*/
}/*end while*/
}
}/end function power/
[/codebox]
oh and dont bother pointing out the fact that no one is actually gonna use this code. i just had to upload so i could feel asif my time hasnt gone to a total waste!!! and i only used taylor (maclaurin) series, not the other one (minimax, remez…)