How to make Cusparse works with FP16 in GPU of capability 7.0?

I am using the Cusparse Csrmv routine for FP16 input. It works in P100 (capability 6.0) but fails in V100(capability 7.0). In v100 it returns an error of “CUSPARSE_STATUS_ARCH_MISMATCH”. I specified arch when compiling. Here is the routine with problem and compiling line.

Did I miss any important flags to make FP16 works in v100?

checkCudaErrors(cusparseCsrmvEx_bufferSize(handle, CUSPARSE_ALG_NAIVE, CUSPARSE_OPERATION_NON_TRANSPOSE, 
               	n_rows, n_rows, nnz, &halpha, CUDA_R_16F,
               	d_hvals, CUDA_R_16F,
               	d_idx1, d_idx2,
               	d_hp, CUDA_R_16F, &hbeta, CUDA_R_16F, 
               	d_hAp,CUDA_R_16F, CUDA_R_16F, &buffer_size));
nvcc -I/software/cuda-toolkit/9.2/samples/common/inc -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_60,code=compute_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -o test -lcusparse