If we have AX=B system (A,X,B all matrices) and A is symmetric and positive definite, we can use Cholesky factorizing to solve this problem.
Then the problem becomes, LL’X = B.
Then we need to perform forward and backward substitution, s.t.:
LY = B
L’X = Y
Forward and backward substitution is also an expensive part of this method. In GPUs, what is the best way to perform these forward and backward substitution algorithms?
Is there any CUDA example available for this case?