Hi!
I’ve been working on the algorithm to compute the determinant of a non-symmetric structured matrix
using the displacement structure approach.
Without getting deep into details, the algorithm is similar to that of standard
Gaussian elimination but performed on matrix generators instead of the matrix itself.
It performs LDU-factorization of a matrix, then the product of diagonal entries yields the determinant.
A simplified algorithm is given below (here an n x n matrix
is represented by 4 generator columns ‘a’, ‘b’, ‘c’ and ‘d’):
for(i = 0; i < n; i++) {
// get the current top elements of generator columns
a0 = a[i], b0 = b[i], c0 = c[i], d0 = d[i];
for(j = i + 1; j < n; j++) {
// Gaussian elimination via Givens rotations
aj = a0*a[j] - c0*b[j];
bj = b0*..
cj = c0*..
dj = d0*..
a[j] = aj, b[j] = bj, c[j] = cj, d[j] = dj; // update vectors with new values
}
det = det * a0 * c0; // collect the factors of determinant
// shift down vectors a and c
a[k+1] = a[k] forall k = i..n
c[k+1] = c[k] forall k = i..n
}
return det;
so, basically the algorithm performs ‘n’ iterations collecting one factor of the determinant at a time.
In each iteration the length of generators (vectors a, b, c and d) decreases by 1.
In my current implementation the inner loop is completely vectorized because
element updates of vectors a, b, c and d can proceed independently.
In other words, I start with ‘n’ working threads where one thread updates one entry of each vector.
In each iteration of the outer loop
the # of working threads decreases by 1. Shifting down the vectors ‘a’ and ‘c’ is done in shared memory.
However this solution is not scalable: I can only handle matrices up to 2048 size, ie., max 512 threads per block, each thread
operates on 4 elements at a time.
So, I need some way to distribute the problem over different blocks which is not easy because
in each iteration of the outer loop the top elements a0, b0, c0 and d0 of generators
must be shared between all threads.
I wonder if anyone have ideas how to deal with this problem ?
thanks