Is there a library function for adding multiple vectors on CUDA?

Thanx in advance.

Is there a library function for adding multiple vectors on CUDA?

Thanx in advance.

You could call cublasaxpy repeatedly.

You could arrange all of your vectors into a matrix, and do a cublasgemv operation, with your vector being all ones. This might sound like “overkill”, but this problem is going to be memory-bound, so the added compute complexity of multiplication-by-1 for each element is not likely to make a difference. And there may be both memory utilization efficiency and kernel-call-overhead efficiency benefits over the multiple - axpy method.

Thank you for answer. Yes, your method seems like “overkill”. So, if there is no library function for adding multiple vectors, I’d better write my own kernel code for this task.

how about nppsAdd_xxx in NPP?

nppsAdd is a “Sample by sample addition of two signals.” (from the documentation, p2536).

That means it can add 2 vectors. That can also be done by the axpy function in cublas.

The request in this thread (if I understand it correctly) is for a function, in a single call, which can add more than 2 vectors at once.

Doink says “cublasTaxpy seems like *overkill*”. nppsAdd is simple(just add 2 vecs).

you wanna add vectors as (vec1 + vec2), (vec3 + vec4), … ? or (vec1 + vec2 + vec3 + vec4…) ?

The Thrust library has some nice features for simple operations like adding vectors. You can wrap your device pointer and use Thrust to perform the operations.

Check out this PDF: http://on-demand.gputechconf.com/gtc-express/2011/presentations/Rapid-Problem-Solving-Using-Thrust.pdf

For float vectors Z = X + Y, the call looks like

```
thrust::transform(
X.begin(), X.end(),
Y.begin(),
Z.begin(),
thrust::plus<float>() );
```