Problems with exp() functions returning NaN and infinity

I am having issues with the use of exp() functions in CUDA.

I’m unsure as to how to explain my problem so I will show my code.

//use the DE to get separation of the first step

	dev_DE = ((dev_E2 - dev_E1) + (dev_E4 - dev_E3)) /2;

	dev_Arg1 = -(dev_E2 - dev_E4 + dev_DE) * beta / 2;

	dev_Arg2 = -(dev_E1 - dev_E3 - dev_DE) * beta / 2;

	dev_Arg3 = -(dev_E4 - dev_E2 + dev_DE) * beta / 2;

	dev_Arg4 = -(dev_E3 - dev_E1 - dev_DE) * beta / 2;

	dev_Tn = (exp(dev_Arg1) - exp(dev_Arg2) + exp(dev_Arg3) - exp(dev_Arg4)) / 2;

	dev_s = dev_Tn;

	dev_s2 = dev_Tn * dev_Tn;

All variables are doubles.

The values are all shown prior to the exp() functions, however as I step through the calculation of dev_Tn the values turn to ??? in nsight 2.0.

Then once it calculates them all instead of showing the value of dev_Tn or dev_s, it only shows ???.

I am almost positive this is where my problem of NaN and infinity is coming from with my results for dev_s/dev_Tn since this is the only spot in my code that I am unable to actually see what is going on.

Has anyone else ran into this problem before?

Replacing the variables with numbers the program would look as follows.

//use the DE to get separation of the first step

	dev_DE = ((dev_E2 - dev_E1) + (dev_E4 - dev_E3)) /2;

	dev_Arg1 = -(dev_E2 - dev_E4 + dev_DE) * beta / 2;

	dev_Arg2 = -(dev_E1 - dev_E3 - dev_DE) * beta / 2;

	dev_Arg3 = -(dev_E4 - dev_E2 + dev_DE) * beta / 2;

	dev_Arg4 = -(dev_E3 - dev_E1 - dev_DE) * beta / 2;

	dev_Tn = (exp(-0.327973340603972) - exp(0.327973340603972) + exp(-0.327973340603972) - exp(0.327973340603972)) / 2;

	dev_s = dev_Tn;

	dev_s2 = dev_Tn * dev_Tn;

p.s.

If this section is done by hand it should look like

dev_Tn = (.720382224 - 1.38815196 + .720382224 - 1.38815196) / 2;

dev_Tn = -0.667769736;

dev_s = dev_Tn;

-0.667769736 is what I should be getting, and is a correct answer for the data we already have.

Any suggestions? This is really confusing me…

I don’t know if this will fix your problem, however you should use expf() instead of exp(). See Appendix C of the CUDA C Programming Guide for more details.

-Arrigo

I had tried that but it still returned the same result.
I tried all of the other suggested functions as well.

It’s quite odd actually because when debugging this section of code it reaches the line with the exp() and it stops.
Each time I move forward it removes the values of the args and returns nothing in the end.

Anyone else have any insight?

Sorry to bump but I really need some help with this.

Are you passing the right compiler flag to enable double precision ( -arch sm_13 or -arch sm_20)?
Is your hardware capable of running double precision?

My command line is as follows.

Runtime API (NVCC Compilation Type is hybrid object or .c file)

set CUDAFE_FLAGS=–sdk_dir “C:\Program Files\Microsoft SDKs\Windows\v6.0A”
“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin\nvcc.exe” --use-local-env --cl-version 2008 -ccbin “c:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin” -G0 --keep-dir “Debug” -maxrregcount=0 --machine 64 --compile -D_NEXUS_DEBUG -g -deviceemu -D_DEVICEEMU -Xcompiler "/EHsc /nologo /Od /Zi /MDd " -o “Debug%(Filename)%(Extension).obj” “%(FullPath)”

I’m using Cuda 4.0 and compute_20,sm_20.

I have a Tesla C2050 for calculations with a Quadro FX 380 LP for the monitor.
I’m running windows 7 64bit.

Sorry, I am not a Windows user, I have no idea if the flags are correct.

If you write a very simple repro case and post it the full code, someone may be able to help.
An error in the math library for exp is highly unlikely.

Are there restrictions to the expf() function? I have not found any but I have also stumbled across information about problems with using expf() on values that are close to zero. Is there any truth to that?

I’m not trying to say that there is an issue in the math libraries I’m just baffled as why it is giving me incorrect results.

I will try to reproduce the error that I’m getting in a simpler form.

If you are using double, you should use exp not expf.

I had originally used exp() instead of expf().
I realize the second is for floats.

None of the changes to that part make any difference.

I think you should check if your setup is correct.

This is a simple example:

#include <stdio.h>

__global__ void trivial_kernel(double *a)

{

      int idx=blockIdx.x*blockDim.x+threadIdx.x;

      a[idx]=(exp(-0.327973340603972) - exp(0.327973340603972) + exp(-0.327973340603972) - exp(0.327973340603972)) / 2;

}

int main(void)

{

      int n=1;

      int nbytes=n*sizeof(double);

double *a=(double*)malloc(nbytes);

double *d_a=0;

      cudaMalloc((void**)&d_a, nbytes);

trivial_kernel<<<1,1>>>(d_a);

      cudaMemcpy(a, d_a, sizeof(double), cudaMemcpyDeviceToHost);

      printf("%14.9f \n",a[0]);

free(a);

      cudaFree(d_a);

}

If you compile this code with

nvcc -arch sm_20 -o exp_test exp_test.cu

and you run it and compare the result to bc, you will see that is computing the correct result

[$ ./exp_test 

  -0.667769740 

$ bc -l

bc 1.06

Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.

This is free software with ABSOLUTELY NO WARRANTY.

For details type `warranty'. 

(e(-0.327973340603972) - e(0.327973340603972) + e(-0.327973340603972) - e(0.327973340603972)) / 2;

-.66776974009882395605

If you are not able to replicate this, you are probably passing the wrong flag to the compiler or running on the wrong GPU.

I ran the test kernel and it produced correct results.

I’m still doing a few tests to try and come up with a reason for my problem.
If I discover anything new I will post back.

Thank you for the help!