Matrix product along batch axis & backpropagation

Hi,
I am Peter and rather new to the CUDA community. I am currently facing the problem that I have to calculate the propagation behavior of a matrix thru a chain of matrices.

My computational problem: I have matrices A, B, C,… stored in on array as a strided batched array in FORTRAN order (all matrices have the same size). Now, I need to calculate for one probe matrix X (always the same)
X•A•B•C•… then A•X•B•C•…, then A•B•X•C•… and so on.

I wonder whether I can use the cuTENSOR framework for that. Can I use the cuTENSOR library to reduce the batch to one matrix by performing a matrix multiplication along the batch? Have you any hint for me how to implement that?
For the whole problem: Are there any CUDA libraries that provide such a functionality already built in? I feel like this problem must have been solved in the context of AI as this is like a back- and forward propagation only with matrices of equal size…

Peter