I just got the following error when I trained my Pytorch model with bfloat16 parameters
File “/opt/conda/envs/XXX/lib/python3.8/site-packages/torch/nn/modules/linear.py”, line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 1 transpose_mat2 0 m 768 n 2304 k 768 mat1_ld 768 mat2_ld 768 result_ld 768 abcType 14 computeType 68 scaleType 0
The types of input, self.weight, self.bias are all bfloat16 and the shapes are (9, 256, 768), (768, 768), (768, ), respectively.
My Pytorch version is 1.14.0.dev20221213+cu116, and my python version is 3.8.15.
Besides, I used 8 A100 GPUs (80 GB).
I ran “torch.cuda.is_bf16_supported()”, and got “True”.
Actually, I have tried different models with BF16 parameters but did not got the same error. However, for some reason, I could not share my model to you. Please tell me if you have any idea about the error. Thanks.