nan value in array

Hello to everyone, I am having a weird problem. If someone could read this and gave me some ideas would be extremely helpfully. Thanks.

I am running a kernel where each threads reads several positions in an array (its assigned position and some neighbours). The kernel receives as a parameter the array from where each threads reads.

The array I send as input is allocated and then I use cudaMemset to put every element to 0 (it should be 0 on the first call to the kernel).

I have discovered that for some positions when I call the kernel the threads are reading a NAN instead of a 0.

I have already checked that the size of the array are ok, so no thread is passing over the maximum position of the array. I have checked the array in 2 different ways just before the kernel call: first, I copy the array back to cpu and check for nan there then I ran a kernel that just checks for NAN at each position (all of this just for testing). None of them have found NAN but when calling the real kernel just after the check tnan appears.

I don’t really know why are all this nan appearing because when I look at the exact same position just before the kernel I get the right value, 0.

May it be that as 2 threads are reading from the same position although at different parts of the kernel there may be memory crush or something?

Hope I could get some ideas from you to try and solve the problem.

Thanks in advance.

That’s hard to tell without any code snippet, but as a blind guess I would suggest that maybe you made a mistake in using either cudaMemset, cudaMemcpy and/or cudaMalloc in regard to the size parameter, which is in Bytes and not in elements… Thereafter, you have to remember to multiply this parameter by sizeof(xxx), xxx being the actual type of data you are referring to.

As Gilles said it is difficult to tell without any code. Running your program under cuda-memcheck might provide a hint if the problem is caused by an out-of-bounds access.

I am new in cuda programming. I am facing same problem. I am getting “-nan” value in cuda kernel after few iteration. cuda kernel have wrriten as follows:

global void odd_u(double *ae,double *aw,double as,double an,doubleat,doubleab,double *ap1,double ap2,doubleap3,double *se,double *sw,double ss,double sn,doublest,doublesb,double *u,double *p,double uo,doubledu) {//printf("\neven_u");

int i = blockDim.x * blockIdx.x + threadIdx.x;
int j = blockDim.y * blockIdx.y + threadIdx.y;
int k = blockDim.z * blockIdx.z + threadIdx.z;
int index = iml+jl+k;
int indexW = (i-1)ml+(j)l+k;
if (indexW > ((n
l)-1)) indexW = (nml) -1;
if (indexW < 0) indexW = 0;
int indexE = (i+1)ml+(j)l+k;
if (indexE > ((n
ml)-1)) indexE = (nml) -1;
if (indexE < 0) indexE = 0;
int indexN = i
if (indexN > ((n
l)-1)) indexN = (nml) -1;
if (indexN < 0) indexN = 0;
int indexS = iml+(j+1)l+k;
if (indexS > ((n
ml)-1)) indexS = (nml) -1;
if (indexS < 0) indexS = 0;
int indexT = i
if (indexT > ((n
l)-1)) indexT = (nml) -1;
if (indexT < 0) indexT = 0;
int indexB = iml+(j)l+(k+1);
if (indexB > ((n
ml)-1)) indexB = (nm*l) -1;
if (indexB < 0) indexB = 0;

if(i>=0 && i<(n-1))
{if(j>0 && j<(m-1))
{if(k>0 && k<(l-1))
if (((i+j+k)%2)!=0)


{ u[index]=aw[index]*u[indexW]+ae[index]*u[indexE]+an[index]*u[indexS]+as[index]*u[indexN]+ab[index]*u[indexT]+at[index]u[indexB]+sw[index]-se[index]+ss[index]-sn[index]+sb[index]-st[index]-(p[indexE]-p[index])deltaydeltaz+((1-alu)(ap1[index]+ap2[index]+ap3[index])/alu)uo[index];



I have calculated size in this way: #define sizeT ((N+2)(M+2)(L+2)*sizeof(float))

and allocation device memory in this way: cudaMalloc((void**) &u, sizeT);

please help me to understand why I am getting “nan” values

NaN (not a number) is the result of an invalid floating-point operation, such as division of zero by zero, taking the square root or logarithm of a negative number, adding two infinities of like sign, subtracting two infinities of opposite sign. IEEE-754 compliant floating-point arithmetic will propagate NaNs through almost all operations, so once created it propagates to the final result.

You should therefore be able to follow a NaN back to the place where it is first generated. Once you have located that, you can then fix the reason the NaN was created.

I have resolved my problem using Cuda Memcheck. Due to wrong index calculation ,it tries to access out of bound memory. Using cuda memcheck I found following errors

Invalid global read of size 4
========= at 0x000004d8 in DisV(float*, float*, float*, float*, float*, float*)
========= by thread (1,0,1) in block (0,0,1)
========= Address 0x400f5cbd8 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time.

Invalid global read of size 4
========= at 0x000004d8 in DisV(float*, float*, float*, float*, float*, float*)
========= by thread (1,0,0) in block (0,0,2)
========= Address 0x400f5cbe4 is out of bounds.

Thanks for guidance.