nvopencc INTERNAL ERROR: /usr/local/cuda/open64/lib//be returned non-zero status 1

Hi there!!

I am new to cuda programming and needs help on the error mentioned on the subject line. I trying to write a simple app .

Assertion failure at line 1255 of …/…/be/com/data_layout.cxx:

Compiler Error in file /tmp/tmpxft_0000327a_00000000-7_revised_ann.cpp3.i during Lowering phase:

size of actual area increased from 16 to 32

nvopencc INTERNAL ERROR: /usr/local/cuda/open64/lib//be returned non-zero status 1
make: *** [obj/release/revised_ann.cu.o] Error 255

I search in the net and seen no similar occurences of such problem.

here is a snip of my global function

global void iterate2(float *d_period, float *d_lump, float *d_annuity, float *d_result)
{

long double fx  = 0.00;
long double ffx = 0.00;
int n=0; int limit= 60000;

//long double hun = 100.00 ;
float xo = 0.0000001 ;
float epsilon = 0.00000000000001;
long double denom=0;
long double num = 0;
long double denom_deriv=0;
long double num_deriv=0;
float denominator=0;
float numerator=0;

n = threadIdx.x;

//fx = compute_annuity2(xo, j, lump, annuity);
numerator = xo*powf((1+xo),d_period[n]);
denominator = powf((1+xo),d_period[n]) - 1 ;
fx =  d_lump[n]*(numerator/denominator) - d_annuity[n];
while ((fabsf(fx) >  epsilon) && n < limit ) {
     // xo = ximp;

// ffx = derivative2(xo,j,lump);
denom = powf((1.0+xo),d_period[n]) - 1;
num = xopowf((xo+1.0), d_period[n]) ;
denom_deriv = powf(denom,2.00);
num_deriv = denom
(d_period[n]xopowf((1.0+xo),(d_period[n]-1)) + powf((1.0+xo),d_period[n])) - num*d_period[n]*powf((1+xo),(d_period[n]-1));

      ffx = d_lump[n]*num_deriv/denom_deriv ;
      xo = xo - fx/ffx;
      //fx = compute_annuity(xo,j,lump,annuity);
      numerator = xo*powf((1+xo),d_period[n]);
      denominator = powf((1+xo),d_period[n]) - 1 ;
      fx =  d_lump[n]*(numerator/denominator) - d_annuity[n];
      //printf("xo = %Lf at iteration  %i\n", xo, n);
      //xo = ximp;
      n++;


}
d_result[n] = xo;

}

thanks in advance

neil

Assuming that you’re seeing this with the CUDA_2.1 release, please attach a fully buildable test app that reproduces this failure.

Hi, I had the same problem, I noticed that, like you, I used ‘long double’ vars for best precision.

When I removed the long modifier on device code it compiled,

still looking for the reason.