how can I use cuda to solve Linear Algebraic Equation

Hello ,
I have a project that required using cuda programming and I was reading many topic to choose one Solution of Linear Algebraic Equation like LU Decomposition , I feel its is kind of hard to me to do it, so for the people who did such things in cuda what I suppose to do first ?

like how can I find a parallel algorithm for lu decomposition then how can I start to program it in cuda .Actually I don’t have to do any optimization or performance between CPU and GPU the thing is how to program it and how to use features into my program like zero copy ,use 2 device ,use global and shared memory .
My second question is does LU Decomposition be easier to copy to shared memory and what is the algorithm do I have to use .My experience on cuda is multiply 2 matrix’s and do some vectors .Any suggestion how to start ?

Thanks

There are several libraries you can use to make life easier. ArrayFire is the most comprehensive (but of course I’m biased cause I work on that one External Image ). MAGMA and CULA are also available. Most problems can be accomplished through library usage, at a fraction of the effort that is required to write your own kernels.

Good luck!

[b]do you mean library do LU IN JACKET or just in ArrayFire and how can I use those library ? I saw the example that came into SDK but I couldn’t compile any of them because there is no steps to compile those example which came with SDK jacket !!
any way I’m specific about LU Decomposition …

Thanks
[/b]

Both Jacket and ArrayFire have functions for LU decomposition.

For MATLAB(R) code that you want to accelerate with Jacket, see LU documentation here. The Jacket SDK is not required to run that function.

For C, C++, Fortran, or Python code, ArrayFire has LU decomposition. Here is an example that comes packaged with ArrayFire. The documentation link is in my first post. You can do single-precision for free on this function, but ArrayFire Pro is required for double-precision.

Good luck!