Hi, CUDA fans!
I want sparse MV with half I/O and float calculation, to reduce memory load.
But it fails.
I tried this:
m, n, nnz,
descrA, csrSortedValA, CUDA_R_16F, csrSortedRowPtrA, csrSortedColIndA,
Is there anything wrong with the code, or it is not implemented yet?
If the latter case, is there future plan for it?
It works if all types are CUDA_R_32F. All CUDA_R_16F are also good.
But half calculation lacks precision for my task.