cuBLAS launch 5 times threads blocks more than expected
|
|
3
|
154
|
April 11, 2024
|
Undefined reference to `cublasCreate_v2'
|
|
16
|
29212
|
April 9, 2024
|
Inaccurate results for int8 in cublasGemmEx
|
|
4
|
254
|
April 19, 2024
|
cublasGemmEx() should not return success when the scaler type is not correct
|
|
0
|
100
|
April 2, 2024
|
Graph Capture of cublasDdot in Device Pointer Mode
|
|
3
|
132
|
March 26, 2024
|
How to use negative leading dimension in cuBLASLt matmul interface?
|
|
0
|
122
|
March 13, 2024
|
cuBLAS Level-1 amax execution error
|
|
1
|
149
|
March 11, 2024
|
Large % of time in cuBLAS calls spent in clock_gettime
|
|
3
|
160
|
March 6, 2024
|
Minor bugs in header file "cublasmp.h" of cuBLASMp
|
|
1
|
209
|
March 5, 2024
|
Can not compile cublas file in windows10
|
|
3
|
271
|
March 19, 2024
|
Stripmining matmul for bandwidth optimization host-to-gpu for LLM computation
|
|
2
|
176
|
February 26, 2024
|
Tensor core architecture deep-dive any whitepaper blog available?
|
|
1
|
248
|
February 20, 2024
|
[cuBLASDx] Support for MM where input type != output type?
|
|
0
|
210
|
February 3, 2024
|
[cuBLASDx] no instance of overloaded function "__half::__half" matches the specified type
|
|
2
|
283
|
January 30, 2024
|
Something wrong after cublasSmatinvBatched!!
|
|
12
|
363
|
December 31, 2023
|
cuBLAS GEMM 2.5 times slower on 4090 than on 3090?
|
|
0
|
272
|
December 25, 2023
|
I have a question about CUDA boot auto-start
|
|
0
|
284
|
December 23, 2023
|
cuBLAS 12 graphs cannot be used as child graphs because of stream ordered memory allocation
|
|
4
|
458
|
December 20, 2023
|
cublastLt optimize memory usage for triangular matrix
|
|
0
|
242
|
December 15, 2023
|
cublasScnrm2(...) keeps crashing and get Segmentation fault (core dumped) $EXECUTABLE
|
|
3
|
316
|
December 12, 2023
|
Statically link cuBlas/cuSparse on Windows?
|
|
2
|
709
|
November 27, 2023
|
How to use CUBLASLT_POINTER_MODE_DEVICE_VECTOR in cublasLt
|
|
0
|
246
|
November 22, 2023
|
Wrong results when using input tensor as output tensor for cuTENSOR
|
|
1
|
350
|
November 20, 2023
|
How multiply a matrix and vector
|
|
0
|
325
|
November 11, 2023
|
Why there is always a memset kernel before a cublas matrix multiplication kernel?
|
|
1
|
280
|
November 13, 2023
|
CuBLAS GeMM + Bias
|
|
0
|
304
|
November 12, 2023
|
How to enable Tensor core for cublasSgemmBatched on H100?
|
|
5
|
404
|
November 17, 2023
|
Ada GeForce (RTX 4090) FP8 cuBLASLt performance
|
|
7
|
6039
|
November 2, 2023
|
cuBLAS GEMM INT8 is much slower than FP16 in T4
|
|
11
|
3228
|
November 2, 2023
|
Why does cublas still choose column major format instead of row major for matrix?
|
|
2
|
352
|
October 30, 2023
|