Cutensor produces runtime error on linux, but not windows

Hello,
I rewrote the yolov4 directly using the cuda libs and it worked on windows. However when compiling on linux(Ubuntu 18.04) I receive runtime errors executing the function cutensorElementwiseBinary. The error code is 15, “CUTENSOR_STATUS_NOT_SUPPORTED”. Upon investigation it seems like cutensorInitTensorDescriptor filled out the descriptor differently on the two operating systems. I am using cuda 11 on both and cutensor version 1.3.3.
The code can be found here: GitHub - TKGgunter/yolov4_tiny_rs: A rust implementation of yolov4_tiny algorithm. .

Thanks for any help.

Would you be able to isolate the failing call s.t. we could have a closer look?

cuTENSOR’s logging capabilities (see User Guide — cuTENSOR 1.3.3 documentation) might be useful to you; please see if CUTENSOR_LOG_LEVEL=1 gives you more insights. If not, could you please report the output of CUTENSOR_LOG_LEVEL=5?

I’ll write a smaller standalone program later, but the relevant code links are below. The first link is to the first instance in main where the error is returned. The second is the source code of that function. The output of CUTENSOR_LOG_LEVEL=5 is given below.

[2021-09-27 08:35:48][cuTENSOR][24264][Api][cutensorInitTensorDescriptor] handle=0X7FFCBB67A9D0, desc_=0X7FFCBB6D2D98, numModes=4, extent=0X560BC4D143F0, stride=0X0, dataType=0, op=11
[2021-09-27 08:35:48][cuTENSOR][24264][Api][cutensorInitTensorDescriptor] handle=0X7FFCBB67A9D0, desc_=0X7FFCBB6D3000, numModes=4, extent=0X560BC4D143F0, stride=0X0, dataType=0, op=1

useA1 useB0 useC1
-Pcs -modeC97,98,99,100 -strideC1,13,169,507, -gamma0.700000 -opC1 -Pas -modeA97,98,99,100 -strideA1,13,169,507, -alpha1.300000 -opA11 -extent100=2,99=3,97=13,98=13, -opAB3 -opABC3 -Pcomps -Relementwise

Just encase it is unclear the function of interest is

calc_elementwise_binary

An example has been write and can be found here yolov4_tiny_rs/linux_error.rs at master · TKGgunter/yolov4_tiny_rs · GitHub.
To run down load the contents of yolov4_tiny_rs/cuda11-cutensor-sys at master · TKGgunter/yolov4_tiny_rs · GitHub. and run the command

cargo run --example linux_error

A C/C++ version can be found here yolov4_tiny_rs/linux_error.c at master · TKGgunter/yolov4_tiny_rs · GitHub

Hello @pspringer , have you or someone at Nvidia been able to look into this problem? Do are there any solutions/work a rounds to this issue?