"Optimal" algorithm to solve unitary bidiagonal system

I have a system which is T * X = Y, T is tridiagonal. However, for numerical stability, I’ve used the bidiagonal factorization
(B’ * D * B) * X = Y, B bidiagonal with a unit diagonal, D diagonal matrix.

Obviously the easiest way to solve this would be 2 calls to gtsv, but this results in a lot of inefficiency in terms of useless data reads (the 1 main diagonal and 0 off-diagonal). Furthermore, it seems to me that there might be a simpler algorithm to solve bidiagonal matrices than PCR. Is there any algorithm that works only on Bidiagonal matrices that I should know about? Also, is there any other approach one might take to solve this system? Cheers.

Not sure about faster, but doing a bit of googling I came up with MAGMA, which appears to solve those problems (although I don’t know at what numerical cost/complexity):
http://icl.cs.utk.edu/magma/index.html
and a paper by Yao Zhang on the topic, which mentions a few big O metrics for the method you’re referencing:

Edit 1: I see mentions of Thomas algorithm on GPU for solving tridiagonal systems also, but apparently that’s not stable by reasing the next article:

Edit 2: In GTC 2013, there was a on talk on “A Scalable, Numerically Stable, High-performance Tridiagonal Solver Using GPUs” – http://www.gputechconf.com/gtcnew/on-demand-gtc.php