The netlib documentation for the SGEMV(TRANS,M,N,ALPHA,A,LDA,X,INCX,BETA,Y,INCY) routine says that:

```
* M - INTEGER.
* On entry, M specifies the number of rows of the matrix A.
* M must be at least zero.
* Unchanged on exit.
* N - INTEGER.
* On entry, N specifies the number of columns of the matrix A.
* N must be at least zero.
* Unchanged on exit.
* A - REAL array of DIMENSION ( LDA, n ).
* Before entry, the leading m by n part of the array A must
* contain the matrix of coefficients.
* Unchanged on exit.
* LDA - INTEGER.
* On entry, LDA specifies the first dimension of A as declared
* in the calling (sub) program. LDA must be at least
* max( 1, m ).
* Unchanged on exit.
```

This means that A should be an LDA x N matrix, with LDA > max(1,M), and the the matrix multiplication will involve only the upper M x N submatrix of A.

The CUDA documentation of cublasgemv() says

```
m number of rows of matrix A.
n number of columns of matrix A.
A <type> array of dimension lda x n with lda >= max(1,n) if transa==CUBLAS_OP_N and lda x m with lda >= max(1,n) otherwise.
```

Should the last statement read as

```
A <type> array of dimension lda x n with lda >= max(1,m) if transa==CUBLAS_OP_N and lda x m with lda >= max(1,n) otherwise.
```

(the first inequality should be lda >= max(1,m) instead of lda >= max(1,n)) ?

Thanks.