Gauss–Jordan elimination

Hi All! I am looking for Gauss–Jordan elimination algorithm implemented on CUDA. can you help me please?

It depends a little on what you want to do, CULAtools have implemented some LAPACK functions. If you want to do GJ-elimination in each thread I’m not sure if that is implemented by CULAtools.

I need to do GJ-elimination without using libraries. Probably i could use some code from libraries but CULAtools seems not to be open source.

I do not want to di GJ-elimination in each thread, i want to transfer square matrix to triangular form using as many threads as necessary.

Final target is solving system of linear equations without using libraries. As i understand gauss-jordan elimination is not used here, but some iterative method. is someone familiar with such staff?

That is an overly broad question with no useful answer. What numerical method you can or should use depends mostly on the structure and properties of the linear system you are trying to solve. You probably want to consult a textbook on the subject first.

I thought it is solved problem. Simple iteration method would be ok.

The point is that it isn’t a solved problem, it is many different solved problems. Which class of problems your system of equations falls into is the key question.

You cannot assume that. The mathematical properties of the linear system being solved have a pronounced effect on what methods are suited and how fast they will converge (if they converge at all).

Any system can be solved using simple iteration method. (if matrix does not pass converge test, there are some preliminary stages to convert it, and then simple method is ok)

If you believe that, then I wish you luck. You are in for some fun. All I can recommend is a good text on solving linear equations.

Do you know any? (book)

The classic van Loan and Golub “Matrix Computations” is probably still the best text on the mathematical foundations and theory of matrices and linear equations. The Saad book “Iterative Methods for Sparse Linear Systems” is a pretty good reference for “modern” sparse methods.

Thank You, avidday


I am working on implementation of Gauss algorithm in CUDA environment and so far I wrote that code

[codebox]global static void Gauss_Parallel(float *a,float *x,float *b,int n,const int b_size)


shared float m;

int tx = threadIdx.x;

for (int k=0; k<n-1; k++)

for(int i=k+1; i<n; i++)



int j=tx; 











for (int i = n - 1; i >= 0; i --)


x[i] = b[i];

for (int j = i + 1; j < n; j ++) x[i] -= a[i*n+j] * x[j];






Unfortunately I’m getting “Microsoft C++ exception: cudaError_enum at memory location 0x0012fdd4…” in my VC++ IDE. Previously I wrote a version where number of theads in a block was the same as number of columns in a matrix and it worked fine. Now as you can see I’m trying to make one thread to do more operation. Precisely on every b_size element of a row (a[i*n+j]=a[i*n+j]-mb[kn+tx] subtraction). I’m a beginner in CUDA so don’t eat me alive. :)

And sorry for my poor English.

Still no suggestions, not even one?

I’m writing whole work about CUDA for a technical degree, so I would very apprieciate any help.