I also have a similar problem. I have whittled the code down to:
[codebox]global void shape_rhs(double *X_quad, double *lrhs, int n)
{
int i,ele;
double func_result[3];
for(i=0; i<3; i++) {
func_result[i] = 2*(2-1+0.25)*M_PI*cos(0.5*M_PI*X_quad[(ele*6)+2*i]);
func_result[i] = func_result[i]*cos(M_PI*X_quad[(ele*6)+2*i+1]);
func_result[i] = func_result[i]+0.5*M_PI*sin(0.5*M_PI*X_quad[(ele*6)+2*i]);
func_result[i] = -func_result[i];
}
*lrhs=func_result[0];
}
int main(int argc, char **argv) {
double *k_X_quad, *k_lrhs;
int n;
shape_rhs<<<128,128>>>(k_X_quad, k_lrhs, n);
}
~ [/codebox]
When I compile with nvcc -arch sm_13 crashtest.cu -o crashtest, i get the following error:
[codebox]Signal: Segmentation fault in Global Optimization – Expression Reshaping phase.
(0): Error: Signal Segmentation fault in phase Global Optimization – Expression Reshaping – processing aborted
*** Internal stack backtrace:
/usr/local/cuda/open64/lib//be [0x6a350f]
/usr/local/cuda/open64/lib//be [0x6a4159]
/usr/local/cuda/open64/lib//be [0x6a38ad]
/usr/local/cuda/open64/lib//be [0x6a4af6]
/lib/libc.so.6 [0x2b189a686100]
/usr/local/cuda/open64/lib//be [0x40f418]
/usr/local/cuda/open64/lib//be [0x411cad]
/usr/local/cuda/open64/lib//be [0x411f65]
/usr/local/cuda/open64/lib//be [0x41201d]
/usr/local/cuda/open64/lib//be [0x4138ee]
/usr/local/cuda/open64/lib//be [0x4df673]
/usr/local/cuda/open64/lib//be [0x4df859]
/usr/local/cuda/open64/lib//be [0x42b956]
/usr/local/cuda/open64/lib//be [0x42bbc0]
/usr/local/cuda/open64/lib//be [0x42bcbd]
/usr/local/cuda/open64/lib//be [0x42c076]
/usr/local/cuda/open64/lib//be [0x42c25d]
/usr/local/cuda/open64/lib//be [0x42167a]
/usr/local/cuda/open64/lib//be [0x47821d]
/usr/local/cuda/open64/lib//be [0x4043a2]
/usr/local/cuda/open64/lib//be [0x40502e]
/usr/local/cuda/open64/lib//be [0x406081]
/usr/local/cuda/open64/lib//be [0x4073ad]
/lib/libc.so.6(__libc_start_main+0xf4) [0x2b189a6721c4]
/usr/local/cuda/open64/lib//be [0x4037ea]
nvopencc INTERNAL ERROR: /usr/local/cuda/open64/lib//be died due to signal 4[/codebox]
I can avoid this happening either by just compiling with nvcc crashtest.cu -o crashtest, in which case the compilation seems to go OK, or by moving the last statement out of the loop:
[codebox]global void shape_rhs(double *X_quad, double *lrhs, int n)
{
int i,ele;
double func_result[3];
for(i=0; i<3; i++) {
func_result[i] = 2*(2-1+0.25)*M_PI*cos(0.5*M_PI*X_quad[(ele*6)+2*i]);
func_result[i] = func_result[i]*cos(M_PI*X_quad[(ele*6)+2*i+1]);
func_result[i] = func_result[i]+0.5*M_PI*sin(0.5*M_PI*X_quad[(ele*6)+2*i]);
}
for(i=0; i<3; i++)
func_result[i] = -func_result[i];
*lrhs=func_result[0];
}
int main(int argc, char **argv) {
double *k_X_quad, *k_lrhs;
int n;
shape_rhs<<<128,128>>>(k_X_quad, k_lrhs, n);
}[/codebox]
Then compilation with nvcc -arch sm_13 seems to go well.
I am using the following nvcc version:
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2009 NVIDIA Corporation
Built on Thu_Apr__9_05:05:52_PDT_2009
Cuda compilation tools, release 2.2, V0.2.1221
I have come across this problem several times now - my general strategy for working round it is to break a loop in to two parts, which usually works. This problem always seems to occur when using automatic arrays and writing to their elements inside a loop. An alternative workaround seems to be to declare func_result as volatile, which also seems to work. Is it possible that the compiler is somehow trying to take the address of a register in this scenario?