There are explicitly published limits on the *dimensions* of matrices in the specification of the relevant interfaces, e.g.:

```
cublasSgemm(cublasHandle_t handle,
cublasOperation_t transa, cublasOperation_t transb,
int m, int n, int k, ...
```

On all platforms supported by CUDA and CUBLAS, `int`

is a signed 32-bit integer type that can represent values in [-2^{31}, 2^{31}-1].

Unless the maximum matrix dimensions are exceeded, or there is not enough GPU memory to hold the matrix, CUBLAS should work fine with a matrix of more than 2^{32}-1 elements, though I haven’t personally tried that as I don’t have a GPU with enough memory to hold an 8+GB matrix. If there is evidence to the contrary, I would consider that a bug, in which case you would want to file a bug report with NVIDIA.

Which CUBLAS function in particular do you observe failing for a matrix with more than 2^{32}-1 elements, and what are the actual dimensions of that matrix?