I see how I can use the cublasLtMatMulDescSetAttribute to set the fill mode to upper/lower. But the cublasLtMatmul call still sets the whole memory range of NxN. Is there any notion of packing the rows to minimize memory usage given that the call knows its upper/lower triangular?
This only matters to me to run inside of cuSolver, so if cuSolver is incompatible with a compressed triangular matrix as input, then that also answers my question because it wont matter. Thanks!