Matrix Size 164986x164986 possible to solve using LU factorization? CUDA

sreevenk.rk · July 15, 2019, 9:05pm

matrix size 164986x164986 i’ve tried converting the CSR format to dense to solve using the Cublas way of getting the Lu decomposition.

Is there any way to get Lu decomposition on gpu by using CSR values on Cuda?

I have already tried Cholskey Qr methods in cusolver… i’am looking to solve using lu.

error: warning: integer overflow in expression [-Woverflow] CHECK_CUDA(cudaMalloc(&d_A, A_num_rows * A_num_rows * sizeof(double)));

warning: integer overflow in expression [-Woverflow] double *h_A = (double *)malloc(A_num_rows * A_num_cols * sizeof(double));

warning: integer overflow in expression [-Woverflow] CHECK_CUDA(cudaMemcpy(h_A, d_A, A_num_rows * A_num_cols * sizeof(double), cudaMemcpyDeviceToHost));

njuffa · July 15, 2019, 10:15pm

Functions like cudaMalloc() take the size as an argument of type ‘size_t’. This is an unsigned 64-bit integer type on all platforms supported by CUDA. Your variables like ‘A_num_rows’ are presumably of type ‘int’, a signed 32-bit integer type on all platforms supported by CUDA.

The compiler warns that the size computation overflows when performed using ‘int’; the overflowed result is assigned to the 64-bit argument. That’s not what you want. You want the correct size computed as a 64-bit quantity using 64-bit arithmetic.

The best-practices idiom for this kind of computation is therefore to put the sizeof() part first:

sizeof(double) * A_num_rows * A_num_rows

Since the result of sizeof() is of type ‘size_t’, subsequent computation is performed using 64-bit integer arithmetic.

You certainly won’t be able to store a dense matrix of 164986 x 164986 ‘double’ elements on the GPU, as that would require 218 GB of storage, whereas the maximum on-board memory of GPUs is ≤ 48 GB (the Quadro RTX 8000 is currently the GPU with the largest on-board memory).

Topic		Replies	Views
cusparseScsrilu02 breaks with large matrices GPU-Accelerated Libraries cublas , cusolver , cusparse	10	1263	June 1, 2022
Sparse LU decomposition got wrong results for large matrices CUDA Programming and Performance	2	702	July 20, 2018
Kernel works just for small matrices CUDA Programming and Performance	14	3196	October 19, 2009
Gaussian elimination, LU-factorization and shared memory CUDA Programming and Performance	1	1504	November 10, 2013
cuSolver LU factorization inside a for loop problem GPU-Accelerated Libraries	10	1414	February 20, 2019
cuSparseLt problem GPU-Accelerated Libraries cuda , cusparse	3	683	January 21, 2023
LU, QR and Cholesky factorizations using GPU CUDA Programming and Performance	100	63575	June 23, 2015
Huge Matrices General question about how best to deal with very large matrices >4 CUDA Programming and Performance	8	2275	July 6, 2009
LU factorization code CUDA Programming and Performance	45	91125	June 23, 2015
How can I set the number of rows in a calculation using cuSolver? GPU-Accelerated Libraries cusolver	3	877	July 8, 2021

Matrix Size 164986x164986 possible to solve using LU factorization? CUDA

Related topics