Hi, when I started working with the cuTENSOR library a few years ago, I discovered an unexpected behavior when calling the cutensorContractionGetWorkspace function (nowadays cutensorContractionGetWorkspaceSize) for certain combinations of modes/indices.
This was with version 1.2.2 of cuTENSOR and as far as I know, this problem no longer occurs with version 1.3.0 and higher. However, since I can’t tell if this problem has really been fixed or if it’s just harder to trigger in newer versions, I thought it best to describe the problem here, so people can have a look at it:
When calling the function cutensorContractionGetWorkspace for a cutensorTensorDescriptor that describes contractions with certain modes like
C_{a,b} = alpha * A_{a,b,c} B_{c,a} + beta * C_{a,b}
or
C_{i,j,k,l} = alpha * A_{p,i,j} B_{p,j,k,l} + beta * C_{i,j,k,l}
I get an immediate segfault on my Tesla P100. On my Tesla V100 there is no segfault, instead my RAM gets flooded by a memory leak until the program terminates due to ‘out of memory’. Both systems use gcc 10.2, CUDA 11.1 and, as mentioned above, cuTENSOR 1.2.2.
This occurs only for very specific combinations of indices. Changing the first example e.g. to
C_{a,b} = alpha * A_{a,b,c} B_{a,c} + beta * C_{a,b}
gets rid of the faulty behavior.
For the two above examples I attached files [workspace.tar.gz (2.8 KB)], which are based on the cuTENSOR contraction sample from https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuTENSOR.