Simple SVD for CUDA

Rocky24 · July 21, 2008, 9:33pm

Hello all,

I am new to CUDA and I am doing a research project to compare the power of GPU computing to the CPU for 3D reconstruction. The main algorithm I have to focus on is Singular Value Decomposition. I have searched up an down for SVD implemented using CUDA or CUBLAS but have yet to find anything. I attempted to take a step by step approach to writing my own but am stuck on how to implement the eigen values and vectors.

I am hoping that perhaps there is someone who has a simple CUDA SVD , nothing too fancy, that they wouldnt mind letting me use to do the main part of my project which is the benchmarking. OR someone who knows enough to help me get my code where it needs to be.

Thanks in advance!!

Yao · July 23, 2008, 7:21pm

There is a technical report, “LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs” that talks about GPU LU and QR decomposition, and GPU-specific issues in terms of performance. Hope it helps.

[url=“http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-49.pdf”]http://www.eecs.berkeley.edu/Pubs/TechRpts...ECS-2008-49.pdf[/url]

Yao

darkstorm · July 24, 2008, 2:18am

A 512 x 512 SVD on CUDA code, one-side jaccobi method, optimized memory access for GPU, faster than Intel MKL SGESVD, slower than SGESDD.
A bidiagnalized input is suggested for better accuracy, output is U and W*V.
ZhangShu, DouHeng, supplementary issue of <>,ChengDu, 2009.7
cusvd_by_ZhangShu.rar (338 KB)

senorbum · July 24, 2008, 2:36pm

In their timings they don’t include mem transfer.

I don’t understand WHY people don’t include this, as it is something you NEED to do if you want to use the GPU. Just keep that in mind when looking at the results.

E.D_Riedijk · July 24, 2008, 5:59pm

Well just as an example, people might generate the input to the SVD on the GPU. I always prefer it when both performance numbers are shown, the ones including & excluding transfer.

As a side note: Anybody have a gaussian elimination version for CUDA lying around? Now I let matlab calculate the inverse of a matrix I generate in CUDA & use the inverse in CUDA again, but I do not need the inverse in my algorithm.

senorbum · July 24, 2008, 6:22pm

At one point in the summer this would have helped me too. I wouldn’t mind taking a look at some code of Gaussian elimination, but don’t need it for anything.

I get sad when I realize that I enjoy looking at linear algebra and related code now <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ />

E.D_Riedijk · July 24, 2008, 7:43pm

I have just been reading about FLAME and there are some working notes suggesting that they have working code for CUDA for these kinds of operations. Tomorrow when at work I’ll dig deeper. I already had a CUDA algo in my head, but am afraid that is it quite sub-optimal ;)

vvolkov · January 28, 2009, 6:52am

No, mem transfer is included in timings (if you are talking about my tech report).

Bugfree · October 15, 2009, 6:04pm

Hi,

Did you have any luck in implementing SVD eventually… I am working on a similar project … Would be great if I could get some input…

Thanks in advance!!

vvolkov · October 15, 2009, 9:49pm

Did you check CULAtools? They seem to have SVD: http://www.culatools.com/versions/basic

There was also a relevant paper “Singular value decomposition on GPU using CUDA” by Lahabar and Narayanan in IPDPS’09.

Jimmy_Pettersson · October 16, 2009, 7:15am

Yes, here is the paper. Really good read.

I’ve tested the culatools SVD and compared to some of the times shown in the paper. Culatools seems to give the same or slightly better results for larger sizes even though I have a GTX260 while they use a GTX280 in their paper…
SVD.pdf (177 KB)

Jimmy_Pettersson · October 16, 2009, 7:17am

Oh yeah, i’ve also had a look at Mr. Volkovs gpu_lapack, very cool stuff!

Bugfree · October 17, 2009, 8:03am

Thanks for the prompt response guys…

The thing is … I have written my own C code for QR factorisation and I am nearly done with my C code for SVD using QR factorization … So I am trying to implement this C code in CUDA… But I am running thru too many issues … even after thoroughly goin thru the programming guide and several examples… So I was wonderin, if there is a code out there that does not use the LAPACK library, which could be of some assistance… I am running short of time guys, I would really appreciate this…

Many thanks…

Jimmy_Pettersson · October 17, 2009, 8:54am

Hmm… check Mr. Volkovs first post here [url=“The Official NVIDIA Forums | NVIDIA”]http://forums.nvidia.com/index.php?showtop...&pid=573376[/url]

Ther is a link to his gpu_lapack code there…

Bugfree · October 18, 2009, 3:48pm

Jim, Could you tell what is the best way I can write the following piece of code on the device…

for(i=k;i<=Row-1;i++)

{

tempp=i*Row+j;

Q[tempp]=Q[tempp]-2.0temptR[i*Col+k];

}

I’ve tried putting idy in place of Col, but that doesnt work … I would like the above code to run in parallel as tempp changes… Any help would be really appreciated…

Thanks in advance…

Topic		Replies	Views
Singular Value Decomposition (SVD) CUDA Programming and Performance	8	17220	September 14, 2018
Help on fixing some poor performances (rookie) CUDA Programming and Performance	10	7162	November 28, 2007
LAPACK + CUBLAS CUDA Programming and Performance	6	9162	July 8, 2008
Getting started with CUDA ... cannot add simple vectors CUDA Programming and Performance	9	20924	January 31, 2011
CUDA very slow performance CUDA Programming and Performance	21	16611	March 6, 2020
Question about vector access performance CUDA Programming and Performance	4	530	December 21, 2018
CUDA OpenCV questions CUDA Programming and Performance	7	2467	November 30, 2010
Time consuming comparison between 820M GPU combination code and pure C++ on CPU CUDA Programming and Performance	2	488	May 22, 2019
Parallel Reduction CUDA Programming and Performance	10	3695	June 26, 2011
Difference in Performance CUDA Programming and Performance	13	9738	August 20, 2008

Simple SVD for CUDA

Related topics