nan value in array

Javier_B1 · May 9, 2012, 10:54am

Hello to everyone, I am having a weird problem. If someone could read this and gave me some ideas would be extremely helpfully. Thanks.

I am running a kernel where each threads reads several positions in an array (its assigned position and some neighbours). The kernel receives as a parameter the array from where each threads reads.

The array I send as input is allocated and then I use cudaMemset to put every element to 0 (it should be 0 on the first call to the kernel).

I have discovered that for some positions when I call the kernel the threads are reading a NAN instead of a 0.

I have already checked that the size of the array are ok, so no thread is passing over the maximum position of the array. I have checked the array in 2 different ways just before the kernel call: first, I copy the array back to cpu and check for nan there then I ran a kernel that just checks for NAN at each position (all of this just for testing). None of them have found NAN but when calling the real kernel just after the check tnan appears.

I don’t really know why are all this nan appearing because when I look at the exact same position just before the kernel I get the right value, 0.

May it be that as 2 threads are reading from the same position although at different parts of the kernel there may be memory crush or something?

Hope I could get some ideas from you to try and solve the problem.

Thanks in advance.

Gilles_C · May 9, 2012, 11:11am

Hi,
That’s hard to tell without any code snippet, but as a blind guess I would suggest that maybe you made a mistake in using either cudaMemset, cudaMemcpy and/or cudaMalloc in regard to the size parameter, which is in Bytes and not in elements… Thereafter, you have to remember to multiply this parameter by sizeof(xxx), xxx being the actual type of data you are referring to.

tera · May 9, 2012, 4:42pm

As Gilles said it is difficult to tell without any code. Running your program under cuda-memcheck might provide a hint if the problem is caused by an out-of-bounds access.

amol2 · April 12, 2016, 11:22am

I am new in cuda programming. I am facing same problem. I am getting “-nan” value in cuda kernel after few iteration. cuda kernel have wrriten as follows:

global void odd_u(double *ae,double *aw,double as,double an,doubleat,doubleab,double *ap1,double ap2,doubleap3,double *se,double *sw,double ss,double sn,doublest,doublesb,double *u,double *p,double uo,doubledu) {//printf(“\neven_u”);

int i = blockDim.x * blockIdx.x + threadIdx.x;
int j = blockDim.y * blockIdx.y + threadIdx.y;
int k = blockDim.z * blockIdx.z + threadIdx.z;
int index = iml+jl+k;
int indexW = (i-1)ml+(j)l+k;
if (indexW > ((nml)-1)) indexW = (nml) -1;
if (indexW < 0) indexW = 0;
int indexE = (i+1)ml+(j)l+k;
if (indexE > ((nml)-1)) indexE = (nml) -1;
if (indexE < 0) indexE = 0;
int indexN = iml+(j-1)l+k;
if (indexN > ((nml)-1)) indexN = (nml) -1;
if (indexN < 0) indexN = 0;
int indexS = iml+(j+1)l+k;
if (indexS > ((nml)-1)) indexS = (nml) -1;
if (indexS < 0) indexS = 0;
int indexT = iml+(j)l+(k-1);
if (indexT > ((nml)-1)) indexT = (nml) -1;
if (indexT < 0) indexT = 0;
int indexB = iml+(j)l+(k+1);
if (indexB > ((nml)-1)) indexB = (nm*l) -1;
if (indexB < 0) indexB = 0;

if(i>=0 && i<(n-1))
{if(j>0 && j<(m-1))
{if(k>0 && k<(l-1))
{
if (((i+j+k)%2)!=0)
{

if(i==0){
u[index]=ae[index]*u[indexE]+an[index]*u[indexS]+as[index]*u[indexN]+ab[index]*u[indexT]+at[index]u[indexB]+sw[index]-se[index]+ss[index]-sn[index]+sb[index]-st[index]-(p[indexE]-p[index])deltaydeltaz+((1-alu)(ap1[index]+ap2[index]+ap3[index])/alu)*uo[index];

}else
{ u[index]=aw[index]*u[indexW]+ae[index]*u[indexE]+an[index]*u[indexS]+as[index]*u[indexN]+ab[index]*u[indexT]+at[index]u[indexB]+sw[index]-se[index]+ss[index]-sn[index]+sb[index]-st[index]-(p[indexE]-p[index])deltaydeltaz+((1-alu)(ap1[index]+ap2[index]+ap3[index])/alu)uo[index];
//__syncthreads();
}
u[index]=aluu[index]/(ap1[index]+ap2[index]+ap3[index]);
__syncthreads();

//printf(“\nodd:%lf”,u[index]);
du[index]=(aludeltaydeltaz)/(ap1[index]+ap2[index]+ap3[index]);
}

}
}
}
return;
}

I have calculated size in this way: #define sizeT ((N+2)(M+2)(L+2)*sizeof(float))

and allocation device memory in this way: cudaMalloc((void**) &u, sizeT);

please help me to understand why I am getting “nan” values

njuffa · April 12, 2016, 4:24pm

NaN (not a number) is the result of an invalid floating-point operation, such as division of zero by zero, taking the square root or logarithm of a negative number, adding two infinities of like sign, subtracting two infinities of opposite sign. IEEE-754 compliant floating-point arithmetic will propagate NaNs through almost all operations, so once created it propagates to the final result.

You should therefore be able to follow a NaN back to the place where it is first generated. Once you have located that, you can then fix the reason the NaN was created.

amol2 · April 18, 2016, 6:43am

I have resolved my problem using Cuda Memcheck. Due to wrong index calculation ,it tries to access out of bound memory. Using cuda memcheck I found following errors

Invalid global read of size 4
========= at 0x000004d8 in DisV(float*, float*, float*, float*, float*, float*)
========= by thread (1,0,1) in block (0,0,1)
========= Address 0x400f5cbd8 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time.

Invalid global read of size 4
========= at 0x000004d8 in DisV(float*, float*, float*, float*, float*, float*)
========= by thread (1,0,0) in block (0,0,2)
========= Address 0x400f5cbe4 is out of bounds.

Thanks for guidance.

Topic		Replies	Views
Passing a multidimensional array to kernel how to allocate space in host and pass to device? CUDA Programming and Performance	12	16211	November 22, 2014
Misaligned address of union CUDA Programming and Performance	12	1903	October 16, 2020
how to get the value of array in a kernel function CUDA Programming and Performance	12	10818	October 31, 2007
Suspect invalid global read with cuda-memcheck CUDA Programming and Performance	3	1996	February 16, 2016
struct of arrays for parameters between host and device CUDA Programming and Performance	6	5100	September 21, 2009
Having 2d array issues CUDA Programming and Performance	7	1063	June 25, 2014
Array offset and coalesced memory access question? CUDA Programming and Performance	3	4843	July 10, 2009
cudaMemcpy problem CUDA Programming and Performance	2	1574	June 29, 2012
nan in simple vector addition CUDA Programming and Performance	7	2410	December 13, 2012
newbie question: cudaMemcpy2DArrayToArray CUDA Programming and Performance	9	16223	June 26, 2008

nan value in array

Related topics