64-bit integer CUBLAS support

In the CUBLAS documentation, I see that the data type for integers relaying matrix sizes and lengths is 32-bit integer. For example:

cublasStatus_t cublasSetVector(int n, int elemSize, const void *x, int incx, void *y, int incy)

Since n is a 32-bit integer, I am prohibited from transmitting a vector that has more than 2^31 entries to my 32 GB GPU.

Intel/MKL solved this problem many years ago through the use of compiler flags. Is there something like that that NVidia has implemented so that I can use CUBLAS to its fullest potential on my 32 GB GPU?


If you would like to transmit more than 2^31 entries, use cudaMemcpy (or cudaMemcpy2D) instead of cublasSetVector/cublasGetVector.

For certain math operations, the cublasXt api provides size_t ranges:


The cublasLt api also provides 64-bit ranges for matrix dimensions:


If you would like to see a change in the cublas API, it’s recommended that you file a bug with RFE in the title. The instructions for filing a bug are linked to a sticky post at the top of the CUDA programming forum.

Thanks, Robert!

Yes, I can use cudaMemcpy. I picked cublasSetVector just as an example.

I actually need quite a bit more functionality beyond the blas-3 level routines like what is found in cublasXt. I use blas-1, blas-2, and blas-3 routines. Also, cublasXt routines must have complete control over the GPU memory, but my software cannot allow that.

I recently started using cusolver dense routines and I just noticed that this 32-bit limitation is there as well.

I will submit a bug regarding this issue.

Thanks again!