Hi, I noticed that the cuTENSOR functions cutensorElementwiseBinary, cutensorElementwiseTrinary and cutensorContraction can produce numerically wrong results when using the same tensor as input and output, e.g.
C_{i,j,k,l} = alpha * C_{i,j,k,p} B_{p,j,k,l} + beta * C_{i,j,k,l}
One could argue that it is a user error to use the input tensor as the output tensor, as one can imagine that this is bound to backfire. However, as far as I can tell this problem is not mentioned in the cuTENSOR documentation (cuTENSOR Functions — cuTENSOR 1.7.0 documentation). And at least for the cutensorContraction function, where one can provide a workspace, the naive user (this is me) could think that this extra memory might allow for something like this to actually work.
Maybe one can throw a CUTENSOR_STATUS_NOT_SUPPORTED or CUTENSOR_STATUS_INVALID_VALUE error in this situation.
I don’t know how this issue is handled in cuBLAS for the gemm functions, as I imagine that this problem also exists for matrix multiplications. Is an error thrown when input and output matrix are identical or is the user expected to know better and not do this?
The issue occured for me on both a Tesla P100 and a Tesla V100 using gcc 10.2, CUDA 11.1 and cuTENSOR 1.7.0.
For cutensorElementwiseBinary and cutensorContraction I included two example files [same_tensor.tar.gz (3.1 KB)], which are based on the cuTENSOR samples from https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuTENSOR.