Signal: Segmentation fault in RVI2 phase

Hi all,

I’ve come across this error when trying to compile a large CUDA kernel (~3000 lines of code) that has a LOT of gotos, it’s machine generated C code.

Signal: Segmentation fault in RVI2 phase.

(0): Error: Signal Segmentation fault in phase RVI2 – processing aborted

*** Internal stack backtrace:

/usr/local/cuda/open64/lib//be() [0x6dcd8f]

/usr/local/cuda/open64/lib//be() [0x6dd9d9]

/usr/local/cuda/open64/lib//be() [0x6dd12d]

/usr/local/cuda/open64/lib//be() [0x6de376]

/lib/x86_64-linux-gnu/libc.so.6(+0x33d80) [0x2ad8e7dd9d80]

/lib/x86_64-linux-gnu/libc.so.6(+0x784ca) [0x2ad8e7e1e4ca]

/lib/x86_64-linux-gnu/libc.so.6(cfree+0x73) [0x2ad8e7e228e3]

/usr/local/cuda/open64/lib//be() [0x5c19c9]

/usr/local/cuda/open64/lib//be() [0x41fafe]

/usr/local/cuda/open64/lib//be() [0x422eb8]

/usr/local/cuda/open64/lib//be() [0x47b9cd]

/usr/local/cuda/open64/lib//be() [0x4044d2]

/usr/local/cuda/open64/lib//be() [0x40515e]

/usr/local/cuda/open64/lib//be() [0x4061f1]

/usr/local/cuda/open64/lib//be() [0x40752d]

/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xff) [0x2ad8e7dc4eff]

/usr/local/cuda/open64/lib//be() [0x4038da]

nvopencc INTERNAL ERROR: /usr/local/cuda/open64/lib//be died due to signal 4

A Google search came up pretty fruitless. Here’s a snippet of the code:

__global__ void fast9Detect(const unsigned char *img, int width, int height, int step, char *ret_is_corner, int b)

{

    int idx = blockIdx.x*blockDim.x + threadIdx.x;

if(idx >= width*height) {

        return;

    }

int xsize = width;

    int ysize = height;

ret_is_corner[idx] = 0;

int y = idx / width;

    int x = idx - (y*width);

if(x < 3 ||  x >= xsize-3 || y < 3 || y >= ysize-3) {

        return;

    }

int pixel[16];

    makeOffsets(pixel, step);

const unsigned char* p = &img[y*step + x];

int cb = *p + b;

    int c_b= *p - b;

        if(p[pixel[0]] > cb)

         if(p[pixel[1]] > cb)

          if(p[pixel[2]] > cb)

           if(p[pixel[3]] > cb)

            if(p[pixel[4]] > cb)

             if(p[pixel[5]] > cb)

              if(p[pixel[6]] > cb)

               if(p[pixel[7]] > cb)

                if(p[pixel[8]] > cb)

                 {}

                else

                 if(p[pixel[15]] > cb)

                  {}

                 else

                  return;

               else if(p[pixel[7]] < c_b)

                if(p[pixel[14]] > cb)

                 if(p[pixel[15]] > cb)

                  {}

                 else

                  return;

                else if(p[pixel[14]] < c_b)

                 if(p[pixel[8]] < c_b)

                  if(p[pixel[9]] < c_b)

                   if(p[pixel[10]] < c_b)

                    if(p[pixel[11]] < c_b)

                     if(p[pixel[12]] < c_b)

                      if(p[pixel[13]] < c_b)

                       if(p[pixel[15]] < c_b)

                        {}

                       else

                        return;

                      else

                       return;

                     else

                      return;

                    else

                     return;

                   else

                    return;

                  else

                   return;

                 else

                  return;

                else

                 return;

               else

                if(p[pixel[14]] > cb)

                 if(p[pixel[15]] > cb)

                  {}

                 else

                  return;

                else

                 return;

              else if(p[pixel[6]] < c_b)

               if(p[pixel[15]] > cb)

                if(p[pixel[13]] > cb)

                 if(p[pixel[14]] > cb)

                  {}

                 else

                  return;

...

     goes on like this for 3000 lines or so

The code is from OpenCV FAST feature detector and I’m trying to port it over. Is there some known issue with very long code? The FAQ says the limit is 2 million instructions, but I’m sure I haven’t exceeded that. I tried turning off optimisation with -O0 and still get the error.

I’m using CUDA Toolkit 3.2 with gcc 4.5.2. All my other CUDA code works fine except for this one. I attempted to try the latest CUDA Toolkit 4.0 but found it does not support gcc 4.5.

I’ve attached the code if anyone wants to try and compile it.

Any help would be appreciated!
CommonDataTypes.h (179 Bytes)
CUDA_FAST9.cu (188 KB)
CUDA_FAST9.h (266 Bytes)

From the compiler output shown this looks like an internal compiler error of sorts. It would be helpful if you could file a bug, attaching self contained repro code, so our compiler team can take a look at this. Since this is observed with CUDA 3.2, there is a chance that the issues is resolved in CUDA 4.0. Since I don’t have a computer with a CUDA toolchain available right now I am unable to take this for a spin myself using the latest CUDA 4.0 bits.