Hi all,
I’ve come across this error when trying to compile a large CUDA kernel (~3000 lines of code) that has a LOT of gotos, it’s machine generated C code.
Signal: Segmentation fault in RVI2 phase.
(0): Error: Signal Segmentation fault in phase RVI2 – processing aborted
*** Internal stack backtrace:
/usr/local/cuda/open64/lib//be() [0x6dcd8f]
/usr/local/cuda/open64/lib//be() [0x6dd9d9]
/usr/local/cuda/open64/lib//be() [0x6dd12d]
/usr/local/cuda/open64/lib//be() [0x6de376]
/lib/x86_64-linux-gnu/libc.so.6(+0x33d80) [0x2ad8e7dd9d80]
/lib/x86_64-linux-gnu/libc.so.6(+0x784ca) [0x2ad8e7e1e4ca]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x73) [0x2ad8e7e228e3]
/usr/local/cuda/open64/lib//be() [0x5c19c9]
/usr/local/cuda/open64/lib//be() [0x41fafe]
/usr/local/cuda/open64/lib//be() [0x422eb8]
/usr/local/cuda/open64/lib//be() [0x47b9cd]
/usr/local/cuda/open64/lib//be() [0x4044d2]
/usr/local/cuda/open64/lib//be() [0x40515e]
/usr/local/cuda/open64/lib//be() [0x4061f1]
/usr/local/cuda/open64/lib//be() [0x40752d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xff) [0x2ad8e7dc4eff]
/usr/local/cuda/open64/lib//be() [0x4038da]
nvopencc INTERNAL ERROR: /usr/local/cuda/open64/lib//be died due to signal 4
A Google search came up pretty fruitless. Here’s a snippet of the code:
__global__ void fast9Detect(const unsigned char *img, int width, int height, int step, char *ret_is_corner, int b)
{
int idx = blockIdx.x*blockDim.x + threadIdx.x;
if(idx >= width*height) {
return;
}
int xsize = width;
int ysize = height;
ret_is_corner[idx] = 0;
int y = idx / width;
int x = idx - (y*width);
if(x < 3 || x >= xsize-3 || y < 3 || y >= ysize-3) {
return;
}
int pixel[16];
makeOffsets(pixel, step);
const unsigned char* p = &img[y*step + x];
int cb = *p + b;
int c_b= *p - b;
if(p[pixel[0]] > cb)
if(p[pixel[1]] > cb)
if(p[pixel[2]] > cb)
if(p[pixel[3]] > cb)
if(p[pixel[4]] > cb)
if(p[pixel[5]] > cb)
if(p[pixel[6]] > cb)
if(p[pixel[7]] > cb)
if(p[pixel[8]] > cb)
{}
else
if(p[pixel[15]] > cb)
{}
else
return;
else if(p[pixel[7]] < c_b)
if(p[pixel[14]] > cb)
if(p[pixel[15]] > cb)
{}
else
return;
else if(p[pixel[14]] < c_b)
if(p[pixel[8]] < c_b)
if(p[pixel[9]] < c_b)
if(p[pixel[10]] < c_b)
if(p[pixel[11]] < c_b)
if(p[pixel[12]] < c_b)
if(p[pixel[13]] < c_b)
if(p[pixel[15]] < c_b)
{}
else
return;
else
return;
else
return;
else
return;
else
return;
else
return;
else
return;
else
return;
else
if(p[pixel[14]] > cb)
if(p[pixel[15]] > cb)
{}
else
return;
else
return;
else if(p[pixel[6]] < c_b)
if(p[pixel[15]] > cb)
if(p[pixel[13]] > cb)
if(p[pixel[14]] > cb)
{}
else
return;
...
goes on like this for 3000 lines or so
The code is from OpenCV FAST feature detector and I’m trying to port it over. Is there some known issue with very long code? The FAQ says the limit is 2 million instructions, but I’m sure I haven’t exceeded that. I tried turning off optimisation with -O0 and still get the error.
I’m using CUDA Toolkit 3.2 with gcc 4.5.2. All my other CUDA code works fine except for this one. I attempted to try the latest CUDA Toolkit 4.0 but found it does not support gcc 4.5.
I’ve attached the code if anyone wants to try and compile it.
Any help would be appreciated!
CommonDataTypes.h (179 Bytes)
CUDA_FAST9.cu (188 KB)
CUDA_FAST9.h (266 Bytes)