"Optimal" algorithm to solve unitary bidiagonal system

I have a system which is T * X = Y, T is tridiagonal. However, for numerical stability, I’ve used the bidiagonal factorization
(B’ * D * B) * X = Y, B bidiagonal with a unit diagonal, D diagonal matrix.

Obviously the easiest way to solve this would be 2 calls to gtsv, but this results in a lot of inefficiency in terms of useless data reads (the 1 main diagonal and 0 off-diagonal). Furthermore, it seems to me that there might be a simpler algorithm to solve bidiagonal matrices than PCR. Is there any algorithm that works only on Bidiagonal matrices that I should know about? Also, is there any other approach one might take to solve this system? Cheers.

Not sure about faster, but doing a bit of googling I came up with MAGMA, which appears to solve those problems (although I don’t know at what numerical cost/complexity):
http://icl.cs.utk.edu/magma/index.html
and a paper by Yao Zhang on the topic, which mentions a few big O metrics for the method you’re referencing:
http://cr-niimm.ksu.ru/attachments/download/16/FastTridiagonalSolversontheGPU.pdf

Edit 1: I see mentions of Thomas algorithm on GPU for solving tridiagonal systems also, but apparently that’s not stable by reasing the next article:

Edit 2: In GTC 2013, there was a on talk on “A Scalable, Numerically Stable, High-performance Tridiagonal Solver Using GPUs” – http://www.gputechconf.com/gtcnew/on-demand-gtc.php

Hey, thanks for replying!
Basically I’m hoping that there are some kind of efficiencies that can be gained by
a) being bidiagonal rather than tridiagonal solves
b) a unitary main diagonal

It’s generally accepted that the cuda tridiag solve is the state-of-the-art as far as tridiagonal solvers go, but I was hoping that maybe some work had been done into the simpler bidiagonal, maybe some algorithms with a lower complexity (PCR, for example, would need half the calculations). And I’m just not good enough at kernels to code one :(

Got it. Not sure if you saw it, but the GTC talk has a link to the code they used for their solver, might be something to check out.

Wow, didn’t pick up on that bit. That’s very generous of them. Thanks for the tip :)