Can cublasXt be used to process batched matmul?

I want to use the hybrid CPU-GPU feature of cublasXt to reduce the memory usage of matrix multiplication, and it works for single-batch matmul. I wonder can cublasXt handle the multi-batch matmul as well? If not, is there any other library has feature like that to save the device-memory usage of batched matmul?