Hi, it seems that cuBLAS uses signed int32 for indexes so it’s impossible to create Tensors bigger than 2^31-1 elements.
with float16 biggest possible tensor occupies less than 4G of GPU memory.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
[cuBLASDx] TF32 support? | 0 | 195 | May 7, 2024 | |
cuBLAS fails when matrix has more than 2^31-1 entries? | 13 | 724 | October 12, 2021 | |
64-bit integer CUBLAS support | 2 | 1303 | May 2, 2019 | |
Query regarding CUBLAS How many element cublasSasum can handle? | 2 | 4479 | October 23, 2007 | |
cublas long long int | 2 | 1213 | September 18, 2013 | |
Tensor core boiler plate with cublas, can not compile | 3 | 14 | February 18, 2025 | |
Turing Arch - INT4 ops with tensor cores | 1 | 2026 | November 1, 2018 | |
Why hasn't CuBLAS implemented a tensor core complex MatMul? | 2 | 112 | September 4, 2024 | |
How is 4GB addressable on 32bit? | 10 | 9233 | August 21, 2009 | |
cuTensor contraction ~5X slower than equivalent CuBLAS sgemm? | 0 | 1013 | August 30, 2020 |