large matrixes can be multiplied in O(n^2.7) time
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
cuBLAS convolution does not use Tensor Cores | 6 | 2191 | June 8, 2021 | |
cuBLAS GEMM INT8 is much slower than FP16 in T4 | 11 | 4266 | November 2, 2023 | |
Is it correct that my Pascal card is calling Maxwell_gemm kernels through cublas? And if so, why is cublas unusably slow for me? | 6 | 940 | August 23, 2018 | |
cuBLAS call from kernel in CUDA 10.0 | 9 | 4835 | April 7, 2021 | |
Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaMemsetAsync | 7 | 7520 | January 11, 2020 | |
Inaccurate results for int8 in cublasGemmEx | 4 | 542 | April 19, 2024 | |
How does cublasGemmEx() call work with CUDA_R_16F inputs and CUDA_R_32F computeType | 3 | 1842 | December 10, 2017 | |
cublasGemmEx cant use CUDA_R_8I compute type on GTX1080 | 4 | 1366 | February 12, 2018 | |
CUDA error when running matrixMulCUBLAS sample - Ubuntu 16.04 | 19 | 13418 | May 4, 2018 | |
cublasSgemm() alway fail during compute intensify task | 14 | 4555 | January 8, 2015 |