This is outside my area of expertise, but I note that the documentation for cuSOLVER (https://docs.nvidia.com/cuda/cusolver/index.html)does not mention multi-GPU operation. I am curious: How large are these systems that they don’t fit into the up to 32 GB of modern GPUs? The literature seems to indicate that this is not an uncommon problem, e.g.

Manuel A.Diaz, Maxim A.Solovchuk, and Tony W.H.Sheu, “High-performance multi-GPU solver for describing nonlinear acoustic waves in homogeneous thermoviscous media.” *Computers & Fluids*, Vol. 173, 15 September 2018, Pages 195-205

“A double-precision numerical solver to describe the propagation of high-intensity ultrasound fluctuations using a novel finite-amplitude compressible acoustic model working in multiple processing units (GPUs) is presented. […] The present multi-GPU implementation aims to make the best use of every single GPU and gain optimal performance of the algorithm on the per-node basis. To assess the performance of the present solver, a typical mini-server computer with 4 Tesla K80 dual GPU accelerators is used.”