Feasibility of a Matlab project

Hi everyone,

I wrote a Matlab code that I would like to accelerate by running it on the GPU. Currently it works on my CPU with the PCT enabled and sparse double precision matrices, but I believe sparse single precision computation on the GPU would be faster, which is not supported by Matlab currently. Could anyone tell me if my goal could be achieved and the additional libraries I would need (I read a lot of stuff about cuBLAS, cuSparse…etc) before I start working on it ? I would need to :

  • Get two matrices A and B from Matlab (approximate minimal size : A is 12000x12000, B is 12000x1)
  • Convert A and B from a sparse double precision Matlab format to a sparse single precision format (Maltab do not support sparse single precision currently)
  • Solve A\B on the GPU (Again Matlab does not offer sparse algebra on a GPU)
  • Return the result to Matlab

Is that something that looks possible ? How ?

Thanks in advance,

Alex

So by “A\B” do you mean matlab “matrix left divide” i.e. solving a (sparse) system of equations?

Matrix left divide in matlab is not a simple operator. It is a heuristic that chooses an algorithm (and solution method) based on the characteristics of A and B. In the sparse case, it is probably using elements of SuiteSparse, a library that Matlab uses under the hood:

http://www.cise.ufl.edu/research/sparse/SuiteSparse/

To contemplate moving this operation to the GPU, there is no direct equivalent to matrix left divide available in CUDA or CUDA libraries (AFAIK). There are a variety of solution techniques available for solving a sparse system of equations (e.g. conjugate gradient, as an iterative method, if A is SPD), but you would have to pick one of these, based on an understanding of your specific case.

It’s also possible that under the hood, Matlab is not using an iterative method, but a direct method, because of the relatively “small” size of your sparse system (12000x12000).

I would also point out that the CHOLMOD portion of SuiteSparse is now available in beta form that takes advantage of NVIDIA GPUs:

http://www.cise.ufl.edu/research/sparse/cholmod/

I believe it is possible to use CHOLMOD directly in Matlab (with some extra steps) but I don’t know if it’s possible to pick up the GPU-accelerated beta yet , and again it is not as simple as issuing a matrix left divide command. You would have to begin to contemplate and understand specific solution methods, based on the specifics of your case (characteristics of A).

Certainly the other steps of your outline are feasible (moving data to/from matlab) and you could use either the mex interface, or possibly the cuda/ptx interface, to do this.

Thanks for the information ! Here are a few more details then :

  • Yes I meant “matrix left divide”, applied to a sparse system.
  • Here are a few characteristics of my A and B matrices :
  • First size is not fixed.
    A is a block diagonal matrix made of blocks of 1212 (I have a lot of these matrices). When I benchmarked (filling the matrix + performing A\B) my code, the fastest computer I had (depended on the hardware, OS and lots of things…) had the fastest processing speed with around 1000 matrices concatenated in a block diagonal matrix.
    Total size of A : mxn=12000x12000
    Density : nz=76
    m=76000 (very very sparse)
    B is a vertcat of a 12*1 vector, with only 1 non-zero element

  • A contains real elements, but is not symetric nor positive or anything else.

Conclusion(s) : If you’ve read up to here, congrats ! ^^ I will have a look at the links you showed me. If anything comes to your mind let me know.

Thanks,

Alex