Trying to get cusparseCsrmvEx working

I’m using CUDA 9.0 and trying to get cusparseCsrmvEx working for load-balanced matrix vector multiplication - but it is failing on the matrix multiplication step. Here is what I have:

size_t workInBytes = 0;
	status = cusparseCsrmvEx_bufferSize(handle,
		CUSPARSE_ALG1,
		CUSPARSE_OPERATION_NON_TRANSPOSE,
		N, input_size, nnz,
		&s_one, CUDA_R_32F,
		descr,
		AA_devptr, CUDA_R_32F,
		IA_devptr,
		JA_devptr,
		input_vec_imag_devptr, CUDA_R_32F,
		&s_zero, CUDA_R_32F,
		&s_zero, CUDA_R_32F,
		CUDA_R_32F,
		&workInBytes);
	
	if (status != CUSPARSE_STATUS_SUCCESS) {
		CLEANUP("buffer calculation failed");
		return 1;
	}


	 status = cusparseCsrmvEx(handle,
		 CUSPARSE_ALG1,
		 CUSPARSE_OPERATION_NON_TRANSPOSE,
		 N, input_size, nnz,
		 &s_one, CUDA_R_32F,
		 descr,
		 AA_devptr, CUDA_R_32F,
		 IA_devptr,
		 JA_devptr,
		 input_vec_imag_devptr, CUDA_R_32F,
		 &s_zero, CUDA_R_32F,
		 &s_zero, CUDA_R_32F,
		 CUDA_R_32F,
		 &workInBytes);

		if (status != CUSPARSE_STATUS_SUCCESS) {
		CLEANUP("Matrix-vector multiplication failed");
		return 1;
		}

Am I using the function incorrectly?

Your use of &workInBytes doesn’t appear to be correct, but my guess would be that is not the source of whatever problem you are having.

What would be the correct input/output for the buffer?

The regular csrmv works for me:

status= cusparseScsrmv(handle,CUSPARSE_OPERATION_NON_TRANSPOSE, N, input_size, nnz,
	&s_one, descr, AA_devptr, IA_devptr, JA_devptr, 
	&input_vec_imag_devptr[0], &s_zero, &s_zero);
if (status != CUSPARSE_STATUS_SUCCESS) {
CLEANUP("Matrix-vector multiplication failed");
return 1;
}

It’s described in the reference manual:

https://docs.nvidia.com/cuda/cusparse/index.html

read the descriptions for the two cusparse functions you are using, carefully.

Like I said, I doubt this is the problem. It’s nearly impossible to diagnose any problems based on what you provided. Many things could be wrong, such as your memory allocations, CSR matrix structure, etc.