Segmentation fault / memory leak for cuTENSOR function cutensorContractionGetWorkspace

JPJoost · October 23, 2023, 9:20pm

Hi, when I started working with the cuTENSOR library a few years ago, I discovered an unexpected behavior when calling the cutensorContractionGetWorkspace function (nowadays cutensorContractionGetWorkspaceSize) for certain combinations of modes/indices.

This was with version 1.2.2 of cuTENSOR and as far as I know, this problem no longer occurs with version 1.3.0 and higher. However, since I can’t tell if this problem has really been fixed or if it’s just harder to trigger in newer versions, I thought it best to describe the problem here, so people can have a look at it:

When calling the function cutensorContractionGetWorkspace for a cutensorTensorDescriptor that describes contractions with certain modes like

C_{a,b} = alpha * A_{a,b,c} B_{c,a} + beta * C_{a,b}

or

C_{i,j,k,l} = alpha * A_{p,i,j} B_{p,j,k,l} + beta * C_{i,j,k,l}

I get an immediate segfault on my Tesla P100. On my Tesla V100 there is no segfault, instead my RAM gets flooded by a memory leak until the program terminates due to ‘out of memory’. Both systems use gcc 10.2, CUDA 11.1 and, as mentioned above, cuTENSOR 1.2.2.

This occurs only for very specific combinations of indices. Changing the first example e.g. to

C_{a,b} = alpha * A_{a,b,c} B_{a,c} + beta * C_{a,b}

gets rid of the faulty behavior.

For the two above examples I attached files [workspace.tar.gz (2.8 KB)], which are based on the cuTENSOR contraction sample from https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuTENSOR.

pspringer · November 21, 2023, 7:19am

Hi JPJoost,

thanks for filing this bug.

We’ve just released cuTENSOR 2.0 today: https://developer.nvidia.com/cutensor which has a much improved mechanism to query the workspace (it allows you to query the workspace that was actually used by the plan, see here; moreover, 2.0 has a much improved performance over 1.x (plots will follow shortly); thus, I’d encourage you to switch to 2.x.

Cheers, Paul

Topic		Replies	Views
Cutensor 2.3.1 estimateWorkspaceSize unexpected result GPU-Accelerated Libraries cutensor	0	42	October 21, 2025
cuTENSOR with Unified Memory GPU-Accelerated Libraries	0	520	December 9, 2019
Wrong results when using input tensor as output tensor for cuTENSOR GPU-Accelerated Libraries cublas , cutensor	1	512	November 20, 2023
cuTENSOR 2.0: A Comprehensive Guide for Accelerating Tensor Computations Technical Blog	1	338	March 10, 2024
Segmentation fault (core dumped) while doing Tensorrt optimization of lenet Jetson TX2	6	6361	October 18, 2021
OOM of conv layer TensorRT	4	703	October 12, 2021
When I use cudnnGetRNNWorkspaceSize，segmentation fault GPU-Accelerated Libraries	0	650	May 23, 2017
cuTensor contraction ~5X slower than equivalent CuBLAS sgemm? GPU-Accelerated Libraries	0	1089	August 30, 2020
cudnnBackend api memory leak in calculate 1x1 convolution workspace size with ENGINEHEUR cuDNN cudnn	3	45	January 1, 2026
cuQuantum tensornet example fails with segmentation fault (using HPC SDK) GPU-Accelerated Libraries cuquantum	2	896	March 10, 2022

Segmentation fault / memory leak for cuTENSOR function cutensorContractionGetWorkspace

Related topics