Multiple vectors adding

Doink · October 16, 2015, 12:57pm

Is there a library function for adding multiple vectors on CUDA?

Thanx in advance.

Robert_Crovella · October 16, 2015, 6:39pm

You could call cublasaxpy repeatedly.

You could arrange all of your vectors into a matrix, and do a cublasgemv operation, with your vector being all ones. This might sound like “overkill”, but this problem is going to be memory-bound, so the added compute complexity of multiplication-by-1 for each element is not likely to make a difference. And there may be both memory utilization efficiency and kernel-call-overhead efficiency benefits over the multiple - axpy method.

Doink · October 19, 2015, 8:31am

Thank you for answer. Yes, your method seems like “overkill”. So, if there is no library function for adding multiple vectors, I’d better write my own kernel code for this task.

episteme · November 5, 2015, 12:31pm

how about nppsAdd_xxx in NPP?

Robert_Crovella · November 5, 2015, 2:54pm

nppsAdd is a “Sample by sample addition of two signals.” (from the documentation, p2536).

That means it can add 2 vectors. That can also be done by the axpy function in cublas.

The request in this thread (if I understand it correctly) is for a function, in a single call, which can add more than 2 vectors at once.

episteme · November 6, 2015, 12:54am

Doink says “cublasTaxpy seems like overkill”. nppsAdd is simple(just add 2 vecs).
you wanna add vectors as (vec1 + vec2), (vec3 + vec4), … ? or (vec1 + vec2 + vec3 + vec4…) ?

trf86 · November 12, 2015, 12:50pm

The Thrust library has some nice features for simple operations like adding vectors. You can wrap your device pointer and use Thrust to perform the operations.

Check out this PDF: http://on-demand.gputechconf.com/gtc-express/2011/presentations/Rapid-Problem-Solving-Using-Thrust.pdf

For float vectors Z = X + Y, the call looks like

thrust::transform(
    X.begin(), X.end(),
    Y.begin(),
    Z.begin(),
    thrust::plus<float>() );