Pointers align requirement for api:cublasGemmBatchedEx

cublasGemmBatchedEx requires pointers aligned to bytes cases below

1.Atype is CUDA_R_16F || CUDA_R_16BF

2.computeType is any of FAST option

3.algo enable fast math modes

with following rules

1.if k % 8 == 0, intptr_t(ptr) % 16 == 0

2.if k % 2 == 0, intptr_t(ptr) % 4 == 0

The problem I faced is that

1.lhs_shape →[5,128,8,1,64] rhs_shape→[5,1,8,64,201]

2.Atype = float, computeType = CUBLAS_COMPUTE_32F_FAST_TF32

3.mathmode = CUBLAS_TF32_TENSOR_OP_MATH

thus the output shape is [5,128,8,1,201], and the ptrs of matrixes in output will not meet the alignment requirement.

But I success in use cublasGemmBatchedEx under cublas.12.2.1.16, without missaligned error

Can anyone explain it?

according to cuda document, it says “recommend that they meet ”

so, maybe it is just a coincident that this case can use this api.

but it’s highly recommended not to do that