Segmentation fault / memory leak for cuTENSOR function cutensorContractionGetWorkspace

Hi, when I started working with the cuTENSOR library a few years ago, I discovered an unexpected behavior when calling the cutensorContractionGetWorkspace function (nowadays cutensorContractionGetWorkspaceSize) for certain combinations of modes/indices.

This was with version 1.2.2 of cuTENSOR and as far as I know, this problem no longer occurs with version 1.3.0 and higher. However, since I can’t tell if this problem has really been fixed or if it’s just harder to trigger in newer versions, I thought it best to describe the problem here, so people can have a look at it:

When calling the function cutensorContractionGetWorkspace for a cutensorTensorDescriptor that describes contractions with certain modes like

C_{a,b} = alpha * A_{a,b,c} B_{c,a} + beta * C_{a,b}


C_{i,j,k,l} = alpha * A_{p,i,j} B_{p,j,k,l} + beta * C_{i,j,k,l}

I get an immediate segfault on my Tesla P100. On my Tesla V100 there is no segfault, instead my RAM gets flooded by a memory leak until the program terminates due to ‘out of memory’. Both systems use gcc 10.2, CUDA 11.1 and, as mentioned above, cuTENSOR 1.2.2.

This occurs only for very specific combinations of indices. Changing the first example e.g. to

C_{a,b} = alpha * A_{a,b,c} B_{a,c} + beta * C_{a,b}

gets rid of the faulty behavior.

For the two above examples I attached files [workspace.tar.gz (2.8 KB)], which are based on the cuTENSOR contraction sample from

Hi JPJoost,

thanks for filing this bug.

We’ve just released cuTENSOR 2.0 today: which has a much improved mechanism to query the workspace (it allows you to query the workspace that was actually used by the plan, see here; moreover, 2.0 has a much improved performance over 1.x (plots will follow shortly); thus, I’d encourage you to switch to 2.x.

Cheers, Paul