Hi Everyone,

I run Sparse MVM on A100 40GB for varying matrix sizes and sparsity levels. I am using the COO format. Below is the plot for the same:

I am dealing with a structured sparsity involving diagonals i.e non-zeros are present only on diagonals (main diagonal + non-main diagonals).

What I find strange is the performance improvement I get for the matrix of size 65536 at sparsity level >= 0.5. For sparsity < 0.5, I don’t see the 10x performance benefit. Unfortunately, I cannot run matrices > 65536 and sparsity of 0.5 or less as that does not fit on my single A100 GPU.

Is there an explanation for this behavior?

This is a snippet of my code:

```
clock_gettime(CLOCK_MONOTONIC, &startNew);
cudaEventRecord(start);
cusparseSpMV(cusparseHandle, CUSPARSE_OPERATION_NON_TRANSPOSE,
&alpha, matDescr, vecX, &beta, vecY,
CUDA_R_32F, CUSPARSE_MV_ALG_DEFAULT, d_buffer);
cudaEventRecord(stop);
cudaEventSynchronize(stop);
clock_gettime(CLOCK_MONOTONIC, &stopNew);
```