This might rather be a general BLAS API question.
If I want to add two vectors and store it into another vector, (C[i] = A[i] + B[i])
how could I use CUBLAS API to do this?
It seems that there isn’t a single API that can calculates the above operation.
What I think could do is to use 2 BLAS calls, one is to copy A[i] to C[i] and another one is to do (C[i] = C[i] + B[i]).
Is this the only way to do above calculation using CUBLAS?
The kernel itself is extremely simple, but I’d like to know how other people do this using BLAS API.
If A and B are already on the device, then using a kernel to accomplish what you want to do could be beneficial. This kernel would be very simple to write. However, if you have to transfer A and B to the device and then retrieve C, then the communication time would be very expensive as there is very little for the device to do (add two numbers). In the latter case this is best left for the CPU to do.