Matrix Size 164986x164986 possible to solve using LU factorization? CUDA

matrix size 164986x164986 i’ve tried converting the CSR format to dense to solve using the Cublas way of getting the Lu decomposition.

Is there any way to get Lu decomposition on gpu by using CSR values on Cuda?

I have already tried Cholskey Qr methods in cusolver… i’am looking to solve using lu.

error: warning: integer overflow in expression [-Woverflow] CHECK_CUDA(cudaMalloc(&d_A, A_num_rows * A_num_rows * sizeof(double)));

warning: integer overflow in expression [-Woverflow] double *h_A = (double *)malloc(A_num_rows * A_num_cols * sizeof(double));

warning: integer overflow in expression [-Woverflow] CHECK_CUDA(cudaMemcpy(h_A, d_A, A_num_rows * A_num_cols * sizeof(double), cudaMemcpyDeviceToHost));

Functions like cudaMalloc() take the size as an argument of type ‘size_t’. This is an unsigned 64-bit integer type on all platforms supported by CUDA. Your variables like ‘A_num_rows’ are presumably of type ‘int’, a signed 32-bit integer type on all platforms supported by CUDA.

The compiler warns that the size computation overflows when performed using ‘int’; the overflowed result is assigned to the 64-bit argument. That’s not what you want. You want the correct size computed as a 64-bit quantity using 64-bit arithmetic.

The best-practices idiom for this kind of computation is therefore to put the sizeof() part first:

sizeof(double) * A_num_rows * A_num_rows

Since the result of sizeof() is of type ‘size_t’, subsequent computation is performed using 64-bit integer arithmetic.

You certainly won’t be able to store a dense matrix of 164986 x 164986 ‘double’ elements on the GPU, as that would require 218 GB of storage, whereas the maximum on-board memory of GPUs is ≤ 48 GB (the Quadro RTX 8000 is currently the GPU with the largest on-board memory).