Common Runtime Math Function Not Working

samritmaity · March 29, 2008, 10:32am

Dear all,
the following code i got from Cuda Programming Guide Manual. But it is not showing correct output … do anybody have any clue why the code is acting like this… Code and corrosponding output is as follows … Thanks in advance for any help or advice.
with regards
sam :(

---------------------source code------------------------------------------------------------
/* Cuda GPU Based Program that use GPU processor for finding cosine of numbers */

/* --------------------------- header secton ----------------------------*/
#include<stdio.h>
#include<cuda.h>

#define ACOS_THREAD_CNT 10
#define N 100

/* --------------------------- target code ------------------------------*/
struct acosParams {
float *arg;
float *res;
int n;
};

global void acos_main(struct acosParams parms)
{
int i;
for (i = threadIdx.x; i < parms.n; i += ACOS_THREAD_CNT) {
parms.res[i] = acosf(parms.arg[i] ) ;
}
}

/* --------------------------- host code ------------------------------/
int main (int argc, char argv[])
{
int i = 0;
cudaError_t cudaStat;
float acosRes = 0;
float acosArg = 0;
float* arg = (float ) malloc(Nsizeof(arg[0]));
float* res = (float ) malloc(Nsizeof(res[0]));
struct acosParams funcParams;
/* … fill arguments array â€˜argâ€™ … */
for(i=0; i < N ; i++ ){
arg[i] = (float)i ;
}

cudaStat = cudaMalloc ((void **)&acosArg, N * sizeof(acosArg[0]));
if( cudaStat )
printf(" value = %d : Memory Allocation on GPU Device failed\n", cudaStat);

cudaStat = cudaMalloc ((void **)&acosRes, N * sizeof(acosRes[0]));
if( cudaStat )
printf(" value = %d : Memory Allocation on GPU Device failed\n", cudaStat);

cudaStat = cudaMemcpy (acosArg, arg, N * sizeof(arg[0]), cudaMemcpyHostToDevice);
if( cudaStat )
printf(" Memory Copy from Host to Device failed\n", cudaStat);

funcParams.res = acosRes;
funcParams.arg = acosArg;
funcParams.n = N;
acos_main<<<1,ACOS_THREAD_CNT>>>(funcParams);

cudaStat = cudaMemcpy (res, acosRes, N * sizeof(acosRes[0]), cudaMemcpyDeviceToHost);
if( cudaStat )
printf(" value = %d : Memory Allocation on GPU Device failed\n", cudaStat);
for(i=0; i < N ; i++ ){
if ( i%10 == 0 )
printf("\n acosf(%f) = %f ", arg[i], res[i] );
}
}

-------------------------command used for compilation-----------------------------------------------

$ nvcc cuda-cos-finding.cu -use_fast_math

-------------------------output-----------------------------------------------------------------------------
$./a.out

acosf(0.001000) = 1.569796
acosf(10.001000) = nan
acosf(20.000999) = nan
acosf(30.000999) = nan
acosf(40.000999) = nan
acosf(50.000999) = nan
acosf(60.000999) = nan
acosf(70.000999) = nan
acosf(80.000999) = nan
acosf(90.000999) = nan

DenisR · March 29, 2008, 11:30am

Here you tell the compiler to use the fast (less accurate) version of cosf. I believe in the programming manual it is stated that the fast version of cosf (__cosf) is expecting input between -pi and pi, elsewhere the outcome is undefined I believe, but you can find it in the programming manual.

So if you remove -use_fast_math, it will probably work.

seibert · March 30, 2008, 12:28pm

Actually, this has nothing to do with -use_fast_math. The domain of the acos() function is [-1,1] because that is the range of cos(). There is no angle which can give you a cosine of 10, so all implementations are supposed to return NaN. :)

DenisR · March 30, 2008, 1:16pm

Me = blind ;) That little a did not catch my attention, the -use_fast_math did :)

Topic		Replies	Views
Strange behavior of cosf function (possible bug ?) CUDA Programming and Performance	13	2282	March 6, 2013
Why when I tried to use "cosf" function in CUDA, there ocurred errors? CUDA Programming and Performance cuda	11	592	August 28, 2023
Fastmath functions Speed or accuracy CUDA Programming and Performance	8	21696	April 16, 2009
trigonometric functions standard c v/s cuda CUDA Programming and Performance	13	6000	October 25, 2015
COS and something strange cos results are not the same CUDA Programming and Performance	6	6095	December 10, 2007
A faster and more accurate implementation of sincosf() CUDA Programming and Performance	25	9865	August 6, 2017
sinf() - cosf() doesn't work on GPU? Legacy PGI Compilers	4	4883	January 16, 2014
no hacos in fp16? CUDA Programming and Performance	0	380	November 12, 2017
Accuracy in GPU floating point calculations CUDA Programming and Performance	35	8555	September 9, 2011
__sinf and __cosf errors Minimizing errors CUDA Programming and Performance	5	8504	November 25, 2010

Common Runtime Math Function Not Working

Related topics