Sparse Matrix -> Vector


I have a sparse matrix which I’ve stripped out and contained in a vector “A”. I plan to send this vector to my GPU to do some computations and return the results. I have to take every number in the vector A and multiply it by a vector “B”. However, my multiplication has to happen in a specific order - if it were still in the sparse matrix format, it would multiply all of the non-zero elements in a column by the vector B.

If I keep track of how many non-zero elements I have per column, is there a way to tell CUDA to multiply elements x-y in vector A by vector B? For instance, here is the code I have:

__global__ void L(complex *A, complex *B, int last, int next, int size)


	int tid = blockIdx.x * blockDim.x + threadIdx.x;

	int col;

	if(tid < next && tid > last){

		A[tid] *= B[tid];



last is the starting point in vector A and next is the end point in vector A. The problem with this code is that the if statement essentially shuts down and threads not operating within (tid < next && tid > last).

If I simply loop through the vector, it is actually much slower than just sending the entire sparse matrix to the card and computing it that way.

So basically, I need to take a chunk of vector A, multiply it by B, then move to the next chunk and do the same thing.

Thoughts or suggestions would be great!

So this is effectively computing the Hadamard product of a pair of Sparse vectors?

For non trivial matrix sizes and a non trivial number of non-zeros, you might find using a block per row or column of A to be the best approach. If you pass another array containing the starting position of each row/column of A (if the format is something like CSR or CSC you can get this using a cumulative sum over the number of non zeros array), then each block can work in parallel.

Thanks for your reply, avidday. It’s not the Hadamard product - I should have explained that vector B is simply a vector, not a sparse matrix. Vector A originated as a sparse matrix, and I’m using CSR to store it in a single vector. I am trying to parallelize something known as Bifactorization (I think it has been “renamed” to the W method). I’m basically grabbing all of the elements below the main diagonal and putting those into a smaller vector, then running some computations on it, then I’m doing the same thing for the elements above the main diagonal and running some different computations on it.

For each element in the results vector, C, I take the corresponding A element and multiply it by the entire column (of the sparse matrix) starting at the same position as A. But since this is now a CSR formatted vector, I have to stop before the next column’s data begins. For example, say the results vector and vector B are 10,000 elements long. For each column in the sparse matrix (which is in CSR format) C[100] = A[100]*B[100] + A[100] * B[101] … + A[100] * B[10000]. The next iteration would be C[101] = A[101]*B[101]…A[101]*B[1000]. Since only the results vector is changing, it would be useful to say C[threadIdx.x] = A[threadIdx.x] * B[all threads > threadIdx.x];.

The latter couple of sentences got mathematically more and more confused, but do I understand correctly that what you really want to do is summation of the vector B over a certain index range, and then compute the product of a corresponding non zero in A with that sum?

I’m actually quite terrible with math, which is why my explanations aren’t the greatest. I want, for each element in my vector A (which is a CSR vector) to compute the product of A and B, for each element of B. So, in sequential code - for the entire sparse matrix, I would have something like:

for (int el = 0; el < NUM_BUSES; el++)				// NUM_BUSES*NUM_BUSES = size of sparse matrix

{								// loop through sparse matrix column by column

	for (int j = 0; j < el; j++) 				// elements above diagonal just copied over

		results[j] = B[j];

		results[el] = A[el, el] * B[el]; 		// element on diagonal

	for (int j = el + 1; j < NUM_BUSES; j++) 		// elements below diagonal


		results[j] = A[j, el] * B[el];


	B = results;


Since I’m compacting the sparse matrix into a vector, I only want to do my computations on a given index range, before moving onto the next index range (i.e. column 1’s numbers before column 2’s numbers).

Thanks again for your response.

Maybe someone more erudite that I can help you. I re-read your three posts, but I still can’t understand what it is you are trying to do, sorry.

No worries - I appreciate you taking a look to being with.

Maybe you want to do simply the equivalent of CSRMV ( multiplication of a sparse matrix with a dense vector )

You should look into the CUSPARSE Library available in 3.2 Toolkit.