Cutensor only support in-place ops?

I test with cutensor and it seems only support in-place ops. This is not shown in its doc. Anyone can confirm this? (or people just don’t use it?)

For the majority of cases cuTENSOR does not allow in-place ops. There are few exceptions to this rule, where one of the input tensors has exactly the same memory layout as the input tensor (e.g., tensor contractions of the form C = A * B + C, elementwise operations of the form B = perm(A) + B, reductions of the form B = Reduction(A) + B)