Any method faster than the Gauss elimination method?

Abdopensky · October 6, 2022, 6:52pm

Hi,

I implemented my own CUDA kernel for solving linear algebra AX = B by using the Gauss elimination method. I solved a 1500 x 1500 matrix and it roughly takes me 1500 milliseconds. Any other faster method that I can exploit? I am using a RTX 3070 Laptop

Thanks

Abdoulaye

Robert_Crovella · October 6, 2022, 7:10pm

did you try using cusolver library? You can use a google search to find many questions about solving linear systems with cusolver, including sample codes provided by NVIDIA.

Detailed questions about using cusolver or any of the CUDA math libraries should be posted on the libraries forum.

If I were working on such a problem, I definitely would not start out by writing my own code, except maybe as a learning exercise. I would investigate high-quality libraries first.

If you are asking “are there any other faster method I can exploit while writing my own kernels” I won’t be able to help there - perhaps others will have suggestions.

Abdopensky · October 6, 2022, 7:20pm

I started cuSolver before but I wasn’t sure about the results (Probably I did not implement it correctly, I will try again anyway). But before going forward, do you think that I can have better performance by using this library?

Robert_Crovella · October 6, 2022, 7:24pm

I think it is likely. But really I am just guessing.

Abdopensky · October 6, 2022, 11:55pm

Do you know if it is normal for the function cusolverDnCreate to take too long for loading?

Robert_Crovella · October 7, 2022, 1:18am

I don’t know what that means; “too long for loading”.

My expectation is that for most overhead functions like this, it should take at most a few milliseconds.

I suggest asking questions specific to libraries on the libraries forum

Abdopensky · October 7, 2022, 8:53am

By using the following code, I can remark a latency around 2 seconds. I am using Debug Mode on Visual Studio 2022 (C++/C)

cusolverDnHandle_t* cusolverH_Array = (cusolverDnHandle_t*)malloc(SimData.SelectedGPUNumber * sizeof(cusolverDnHandle_t));
	cudaStream_t* stream_Array = (cudaStream_t*)malloc(SimData.SelectedGPUNumber * sizeof(cudaStream_t));
	for (int i_SelGPU = 0; i_SelGPU < SelectedGPUNumber; i_SelGPU++)
		// For each selected GPU
	{
		cudaSetDevice(SelectedGPUIndex_Array[i_SelGPU]);
		/* step 1: create cusolver handle, bind a stream */
		cusolverH_Array[i_SelGPU] = NULL;
		stream_Array[i_SelGPU] = NULL;
}

Robert_Crovella · October 7, 2022, 9:04am

You’re not actually calling cusolverDnCreate in that code.

Robert_Crovella · October 7, 2022, 6:04pm

The first CUDA runtime API call per device could probably consume 300+ms, so that may be the main factor here.

Abdopensky · October 7, 2022, 10:35pm

The 2 seconds latency happens when passing through the line cusolverH_Array[i_SelGPU] = NULL;

I just forgot to include cusolverDnCreate in the code above but it is still giving the same latency.

Any idea on this strange behaviour?

Robert_Crovella · October 7, 2022, 11:13pm

That’s just ordinary host C code. It has nothing to do with CUDA. No I have no idea about the behavior.

Topic		Replies	Views
cusparse vs cusolver different result to solve Ax = b GPU-Accelerated Libraries	0	1073	October 4, 2016
Help Improving Performance using cuSolver/cuSparse Routines GPU-Accelerated Libraries cuda , nsight , performance , python , pycuda	0	771	December 15, 2023
Problems solving linear system with multiple right-hand side GPU-Accelerated Libraries cusolver , cusparse	0	524	June 13, 2023
What libraries for linear solvers on CUDA can you recommend? GPU-Accelerated Libraries cuda , cusparse	5	2287	November 16, 2022
Why cusparseDcsrsv_solve so slow? GPU-Accelerated Libraries	0	449	March 8, 2018
CULA vs CUSOLVER GPU-Accelerated Libraries	1	1981	February 25, 2015
LU, QR and Cholesky factorizations using GPU CUDA Programming and Performance	100	64075	June 23, 2015
Cusolver solve sparse Ax=b wrong GPU-Accelerated Libraries cusolver	7	315	August 19, 2024
Solve AX = B with cuSolver library (Cuda 7) GPU-Accelerated Libraries	1	2391	March 2, 2015
CUDA and Linear Algebra CUDA Programming and Performance	5	1691	January 10, 2013

Any method faster than the Gauss elimination method?

Related topics