This is not directly related to GPUs given that I am using the CPU only but I will implement on GPU. I didn’t know where else to post this question. If this is the wrong place, can anyone direct me to the proper area?
Anyway, I have implemented sparse matrix-vector multiplication using Compressed Row Storage and Block Compressed Row Storage (2x2 blocks) formats and regardless of the size (or amount of non-zeros) in a randomly-generated sparse matrix the Block Compressed Row Storage format always performs worse than the standard Compressed Row Storage format. Does anyone have any idea why this would be?