Hello,

I have a sparse matrix which I’ve stripped out and contained in a vector “A”. I plan to send this vector to my GPU to do some computations and return the results. I have to take every number in the vector A and multiply it by a vector “B”. However, my multiplication has to happen in a specific order - if it were still in the sparse matrix format, it would multiply all of the non-zero elements in a column by the vector B.

If I keep track of how many non-zero elements I have per column, is there a way to tell CUDA to multiply elements x-y in vector A by vector B? For instance, here is the code I have:

```
__global__ void L(complex *A, complex *B, int last, int next, int size)
{
int tid = blockIdx.x * blockDim.x + threadIdx.x;
int col;
if(tid < next && tid > last){
A[tid] *= B[tid];
}
}
```

last is the starting point in vector A and next is the end point in vector A. The problem with this code is that the if statement essentially shuts down and threads not operating within (tid < next && tid > last).

If I simply loop through the vector, it is actually much slower than just sending the entire sparse matrix to the card and computing it that way.

So basically, I need to take a chunk of vector A, multiply it by B, then move to the next chunk and do the same thing.

Thoughts or suggestions would be great!