In the CUBLAS documentation, I see that the data type for integers relaying matrix sizes and lengths is 32-bit integer. For example:
cublasStatus_t cublasSetVector(int n, int elemSize, const void *x, int incx, void *y, int incy)
Since n is a 32-bit integer, I am prohibited from transmitting a vector that has more than 2^31 entries to my 32 GB GPU.
Intel/MKL solved this problem many years ago through the use of compiler flags. Is there something like that that NVidia has implemented so that I can use CUBLAS to its fullest potential on my 32 GB GPU?
Thanks!
If you would like to transmit more than 2^31 entries, use cudaMemcpy (or cudaMemcpy2D) instead of cublasSetVector/cublasGetVector.
For certain math operations, the cublasXt api provides size_t ranges:
[url]https://docs.nvidia.com/cuda/cublas/index.html#unique_1199120842[/url]
The cublasLt api also provides 64-bit ranges for matrix dimensions:
[url]https://docs.nvidia.com/cuda/cublas/index.html#using-the-cublasLt-api[/url]
[url]https://docs.nvidia.com/cuda/cublas/index.html#cublasLtMatrixLayoutCreate[/url]
If you would like to see a change in the cublas API, it’s recommended that you file a bug with RFE in the title. The instructions for filing a bug are linked to a sticky post at the top of the CUDA programming forum.
Thanks, Robert!
Yes, I can use cudaMemcpy. I picked cublasSetVector just as an example.
I actually need quite a bit more functionality beyond the blas-3 level routines like what is found in cublasXt. I use blas-1, blas-2, and blas-3 routines. Also, cublasXt routines must have complete control over the GPU memory, but my software cannot allow that.
I recently started using cusolver dense routines and I just noticed that this 32-bit limitation is there as well.
I will submit a bug regarding this issue.
Thanks again!