Parallel Cyclic Reduction

dominik · June 24, 2010, 8:00am

I don’t know the code in CUDPP but I wouldn’t be surprised if it was based on the first paper below. Both cyclic reduction papers that I know,
http://graphics.cs.ucdavis.edu/publication…_pub?pub_id=978 and my own (Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid | IEEE Journals & Magazine | IEEE Xplore) solve many smaller tridiagonal systems in parallel, i.e., one system per block. One could probably come up with a technique for larger systems (similar to the “scan large arrays” example in the SDK), but that inadvertedly would be much less efficient because of many more roundtrips to off-chip memory.

Topic		Replies	Views
Optimize parallel cyclic reduction(PCR) code CUDA Programming and Performance	0	828	February 3, 2017
Cyclic-reduction for tri-diagonal system CUDA Programming and Performance	5	2526	March 16, 2010
CUDA issue : cyclic reduction (number crunching routine)-HELP ME PLEASE CUDA issue : cyclic reductio CUDA Programming and Performance	2	3512	February 26, 2010
Tridiagonal solve synchronization CUDA Programming and Performance	0	529	May 22, 2013
matrix solver with 1 thread per block CUDA Programming and Performance	2	984	July 23, 2009
The command and code of multi gpu distributed solving tridiagonal matrix CUDA Programming and Performance	1	461	February 22, 2023
Solution of tridiagonal linear equations CUDA Programming and Performance	2	739	February 20, 2023
iterative methods CUDA implementation details CUDA Programming and Performance	8	8107	December 11, 2007
how to syncthreads between more than 512 threads CUDA Programming and Performance	14	6682	April 13, 2009
parallelizing CUDA Programming and Performance	5	5399	March 13, 2007