Batch Matrix Multiplication using CuBLAS

Hi Nvidia Team,

Actually, I am working on registering a Plugin for an Operator(Einsum) which is not currently supported in TensorRT. So, instead of implementing a CUDA Kernel, I want to use the CuBLAS Library for Batch Matrix Multiplication.

The Equations I want to implement is(from Einsum Operator):
"ntg, ncg → nct" and " nct, ncp-> ntp" (for Batch Matrix Multiplication)

Info about Einsum op: onnx/ at master · onnx/onnx · GitHub

I needed a guidance in using CuBLAS Library for Batched Matrix Multiplication for the above two Ops.

I am referring to the Available references(cuBLAS :: CUDA Toolkit Documentation, Pro Tip: cuBLAS Strided Batched Matrix Multiply | NVIDIA Developer Blog), but I am not getting how to use it for the above Equations.

Can you please assist me for the same?

Thanks in Advance,
Darshan C G