I am attempting to implement a fairly complicated calculation, involving the inverse of a 5x5 matrix with complex elements. I calculate the determinant of the matrix by hand, with a large number of lines like this:
det += n15*n24*n33*n42*n51;
where nIJ is the (I, J)th element of the matrix. (It’s not my fault; I inherited this code from someone else and am porting it to CUDA.) However, the program crashes if I include too many of these lines. It is not any particular line that causes the problem - if I comment out the first three, I can add three more at the end without causing a crash. So it appears that this crash is related to some sort of memory issue with spill loads or stores. Does anyone have experience with nvcc getting confused, or running out of “spill memory”, if the code uses too many local variables? If so, how did you fix it?
I’ve tried reorganising the code to use an accumulator variable instead of the quintuple multiplication above, so that the line looks like this:
devcomplex<double> acc(0, 0); acc = n15; acc *= n24; acc *= n33; acc *= n42; acc *= n51; det += acc;
where ‘devcomplex’ is my complex-number type; this allows me to include more lines, but not enough for the whole calculation. Does anyone have advice along these lines for reducing the memory footprint of the calculation?
I’ve reproduced this problem with CUDA 4.2 and 5.0, on C2050 and C2070.
My next approach will be to see if I can outsource the matrix inversion to cuBLAS. Any other ideas are also welcome.