Performance of SpMV in cuSparse


I have written the following code to measure the performance of SpMV in cuSparse on Tesla C2075. I used the UFL collections as test case and found the performance is only 0.03 GFlops for some matrices, eg “Webbase”. Obviously there is something wrong, but I can’t figure it out. Has anyone ever measured the performance of SpMV in cuSparse? Should I use another method to measure the time?

cudaEventRecord(start_event, 0);

status = cusparseDcsrmv (handle, CUSPARSE_OPERATION_NON_TRANSPOSE,
m, n, nnz, &ftwo,
escr, val_dev, rowptr_dev, colidx_dev,
x_dev, &fone, z_dev);

cudaEventRecord(stop_event, 0);
cudaEventElapsedTime(&timepass, start_event, stop_event);