how can I use cuda to solve Linear Algebraic Equation

Hello ,
I have a project that required using cuda programming and I was reading many topic to choose one Solution of Linear Algebraic Equation like LU Decomposition , I feel its is kind of hard to me to do it, so for the people who did such things in cuda what I suppose to do first ?

like how can I find a parallel algorithm for lu decomposition then how can I start to program it in cuda .Actually I don’t have to do any optimization or performance between CPU and GPU the thing is how to program it and how to use features into my program like zero copy ,use 2 device ,use global and shared memory .
My second question is does LU Decomposition be easier to copy to shared memory and what is the algorithm do I have to use .My experience on cuda is multiply 2 matrix’s and do some vectors .Any suggestion how to start ?

Thanks

There are several libraries you can use to make life easier. ArrayFire is the most comprehensive (but of course I’m biased cause I work on that one ;) ). MAGMA and CULA are also available. Most problems can be accomplished through library usage, at a fraction of the effort that is required to write your own kernels.

Good luck!

[b]do you mean library do LU IN JACKET or just in ArrayFire and how can I use those library ? I saw the example that came into SDK but I couldn’t compile any of them because there is no steps to compile those example which came with SDK jacket !!
any way I’m specific about LU Decomposition …

Thanks
[/b]

Both Jacket and ArrayFire have functions for LU decomposition.

For MATLAB® code that you want to accelerate with Jacket, see LU documentation here. The Jacket SDK is not required to run that function.

For C, C++, Fortran, or Python code, ArrayFire has LU decomposition. Here is an example that comes packaged with ArrayFire. The documentation link is in my first post. You can do single-precision for free on this function, but ArrayFire Pro is required for double-precision.

Good luck!