I’m using unpivoted QR (geqrf) routine in CuSOLVER and noticed that the required workspace size, especially for small matrices is much much larger than what I would expect.
Enough to copy the whole matrix into the workspace multiple times over. Honestly it seems like a bug at an edge case. Is this really the minimum required memory for the algorithm for small columns?
I’m wondering if this is the case, and making this thread to bring it up/report it. I’ve listed the workspace sizes vs the same call in MAGMA for a rough comparison of expected memory usage.
For 10x10: requires 49152 vs MAGMA’s 640 (enough to store 491 copies of the input!)
For 100x100: requires 58368 vs MAGMA’s 6,400
For 1000x1000: requires 288768 vs MAGMA’s 64,000
For 8000 x 1000: requires 2080768 vs MAGMA’s 512000
For 1000 x 8000: requires 1065896 vs MAGMA’s 64000
In my app, I have a range of matrices ranging from tiny, to tall&skinny and large square and the temporary/peak memory use is adding up when streaming multiple of these kernels together (and keeping some output) on the GPU.
It likely won’t affect my app, but it was curious to note when preallocating large groups of workspaces.