Add multiple vectors concurrently

Hi everyone,

I want to design a kernel to add a matrix row pairs concurrently, but I don’t know how to accomplish it.

For example, I have a data matrix, which size is (512, 1024), and I want to add its row pairs(row1+row2, row3+row4,…,row511+row512) at same time.

The reason I’m considering doing this is just for saving time.

Could you give me some advice?
Thanks!

This is just a vector add with some careful indexing.

If you refer to the vectorAdd sample code, you’ll be well on your way.

Robert is right.
If you are not confident writing kernels you could also initialise a 256*512 permutation matrix of sorts (“Z”) and run it through cublas, though speed will of course suffer:

[1 1 0 0 0 0… 0 0]
[0 0 1 1 0 0… 0 0]
[0 0 0 0 1 1… 0 0]

[0 0 0 0 0 0… 1 1]

and then out = Z*(“data matrix”)