Many cublas level 1 functions, i.e. cublasaxpy, cublascopy, cublasdot, cublasrot, cublasrotm, cublasswap, have similar arguments
cublasStatus_t cublasSaxpy(cublasHandle_t handle, int n, const float *x, int incx, float *y, int incy)
where parameter n is “number of elements in the vector x and y.”
My question is given different incx and incy values, do vector x and y have to have n elements without causing memory access issue? For instance, if x has n_x elements and y has n_y elements where n_x = 2 * n_y. Will below call cause memory access violation issue?
The number of elements processed is identical for x and y vectors. The storage spacing (in elements) may differ for x and y. Your underlying allocations therefore need to comprise at least 1+(n-1)*incx and 1+(n-1)*incy elements to avoid access out of bounds.
Yes, that would be a problem. You are asking cublas to do n_x elementwise addition operations. You don’t have n_x elements in the vector y, that are spaced 1 element apart (i.e. packed). If you only want to do n_y elementwise addition operations, it could work like this:
cublasSaxpy(handle, n_y, x, 2, y, 1)
that means in the resultant vector (y), the first element would be the sum of the first elements in each original vector. The second element would be the sum of the second element in y and the 3rd element in x, etc.
Anyway, can’t you test this for yourself? Run your code with cuda-memcheck and any access violations will be evident.
Thanks a lot for the quick reply.
Then shouldn’t the document be changed from
n is “number of elements in the vector x and y”
to
n is “number of operations to be performed”?
In my program, the cublas operation is applied to the middle section of two much bigger vectors, e.g. x = &x_g[x_offset] and y = &y_g[y_offset], so that with stride 2 in x, there are still valid space at or after &y[2 * n_y]. I understand that I can verify the numeric correctness by examining from y[n_y] to y[2 * n_y - 1]. And I realized that I have overlooked cublasaxpy’s document which states
“Hence, the performed operation is y [ j ] = α × x [ k ] + y [ j ] for i = 1 , … , n , k = 1 + ( i - 1 ) * incx and j = 1 + ( i - 1 ) * incy .”
Nevertheless, I still argue that the current document is confusing. Thank you all once again!
However n is the actual length of the vectors x and y “that are participating in the operation”
I think “number of operations to be performed” is potentially also confusing and I don’t think this should be considered correct terminology and could easily lead to questions and confusion by others. There is only one operation to be performed - a vector add. We could talk about how many elementwise operations are performed, but as you point out, the documentation already gives a fairly precise (and I believe, correct) description for clarification.
The inc parameters allow you to pick a “vector” out of a larger matrix, perhaps even a “column” vector out of a row-major matrix, or a “row” vector out of a column-major matrix. Considered this way, we would probably naturally think of the correct vector length that is participating in the operation, and the number of elements in that “vector”, so defined, is actually n.
One has to distinguish between the n-element vector (an abstract concept) and the underlying storage representation of that vector in the context of strided storage (a concrete implementation). BLAS-1 interfaces have traditionally expressed vector length n in terms of the former. CUBLAS merely adopted long-established existing usage.