Hello ,
I have a project that required using cuda programming and I was reading many topic to choose one Solution of Linear Algebraic Equation like LU Decomposition , I feel its is kind of hard to me to do it, so for the people who did such things in cuda what I suppose to do first ?
like how can I find a parallel algorithm for lu decomposition then how can I start to program it in cuda .Actually I don’t have to do any optimization or performance between CPU and GPU the thing is how to program it and how to use features into my program like zero copy ,use 2 device ,use global and shared memory .
My second question is does LU Decomposition be easier to copy to shared memory and what is the algorithm do I have to use .My experience on cuda is multiply 2 matrix’s and do some vectors .Any suggestion how to start ?
There are several libraries you can use to make life easier. ArrayFire is the most comprehensive (but of course I’m biased cause I work on that one ). MAGMA and CULA are also available. Most problems can be accomplished through library usage, at a fraction of the effort that is required to write your own kernels.
[b]do you mean library do LU IN JACKET or just in ArrayFire and how can I use those library ? I saw the example that came into SDK but I couldn’t compile any of them because there is no steps to compile those example which came with SDK jacket !!
any way I’m specific about LU Decomposition …
Both Jacket and ArrayFire have functions for LU decomposition.
For MATLAB® code that you want to accelerate with Jacket, see LU documentation here. The Jacket SDK is not required to run that function.
For C, C++, Fortran, or Python code, ArrayFire has LU decomposition. Here is an example that comes packaged with ArrayFire. The documentation link is in my first post. You can do single-precision for free on this function, but ArrayFire Pro is required for double-precision.