cuspraseCsrmvEx half i/o and float calculation

Hi, CUDA fans!

I want sparse MV with half I/O and float calculation, to reduce memory load.
But it fails.
I tried this:

cusparseCsrmvEx(
handle,
CUSPARSE_ALG_NAIVE, transA,
m, n, nnz,
alpha, CUDA_R_16F,
descrA, csrSortedValA, CUDA_R_16F, csrSortedRowPtrA, csrSortedColIndA,
x, CUDA_R_16F,
beta, CUDA_R_16F,
y, CUDA_R_16F,
CUDA_R_32F,
pBuffer);

Is there anything wrong with the code, or it is not implemented yet?
If the latter case, is there future plan for it?

It works if all types are CUDA_R_32F. All CUDA_R_16F are also good.
But half calculation lacks precision for my task.

Regards,
Hide