Cudla api cudlaImportExternalSemaphore memory-leak

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.8.1
DRIVE OS 6.0.6
[*] DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
[*] Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
[*] other

SDK Manager Version
1.9.3.10904
[*] other

Host Machine Version
[*] native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

Our team is using CUDLA Standalone mode.cudlaImportExternalSemaphore During the engineering testing phase, it was discovered that the cudlaImportExternalSemaphore API leaked 1B of memory every time.We also called the function cudlaMemUnregister to release the resource that we use.
The memory leak troubleshooting tool we use is Valgrind.

Run the command as follows:
/opt/data/haihuawei/valgrind/bin/valgrind --tool=memcheck --leak-check=full --log-file=valgrind.log.txt ./cuDLAStandaloneMode ./resnet50/resnet_simp_conv_shape_1_hwc4_int8.cudla 5 0

The resources required are as follows:
https://drive.google.com/drive/folders/1ogWyTnq3nMb1sn4PT8VI79npHstmfifE?usp=sharing

memory-leak log:

==347623== Syscall param ioctl(generic) points to uninitialised byte(s)
==347623== at 0x4CEA70C: ioctl (ioctl.S:26)
==347623== by 0x4E85B47: ??? (in /usr/lib/libnvrm_host1x.so)
==347623== by 0x4EDA89B: ??? (in /usr/lib/libnvdla_runtime.so)
==347623== by 0x4EB889F: ??? (in /usr/lib/libnvdla_runtime.so)
==347623== by 0x4EB8A57: ??? (in /usr/lib/libnvdla_runtime.so)
==347623== by 0x4EB8E6F: ??? (in /usr/lib/libnvdla_runtime.so)
==347623== by 0x974C2D3: cudlaDrvModuleUnload (in /usr/lib/libnvcudla.so)
==347623== by 0x49396E7: cudlaModuleUnload (in /usr/local/cuda-11.4/targets/aarch64-linux/lib/libcudla.so.1.0.0)
==347623== by 0x40207F: main (in /opt/data/haihuawei/cuDLAStandaloneMode)
==347623== Address 0x1ffefff0d8 is on thread 1’s stack
==347623==
==347623==
==347623== HEAP SUMMARY:
==347623== in use at exit: 3,987 bytes in 13 blocks
==347623== total heap usage: 2,148 allocs, 2,135 frees, 71,683,024 bytes allocated
==347623==
==347623== 1 bytes in 1 blocks are definitely lost in loss record 1 of 10
==347623== at 0x484B828: malloc (vg_replace_malloc.c:442)
==347623== by 0x97249F7: ??? (in /usr/lib/libnvcucompat.so)
==347623== by 0x97447B7: ??? (in /usr/lib/libnvcudla.so)
==347623== by 0x97463EB: ??? (in /usr/lib/libnvcudla.so)
==347623== by 0x9749BCB: cudlaDrvImportExternalSemaphore (in /usr/lib/libnvcudla.so)
==347623== by 0x493BDD3: cudlaImportExternalSemaphore (in /usr/local/cuda-11.4/targets/aarch64-linux/lib/libcudla.so.1.0.0)
==347623== by 0x401C1B: main (in /opt/data/haihuawei/cuDLAStandaloneMode)

Dear @haihua.wei,
Is it possible to test with latest release DRIVE OS 6.0.8.1?

@SivaRamaKrishnaNV
We don’t have an environment with Drive OS 6.0.8. Can you help us see if we can reproduce the same issue on 6.08

Dear @haihua.wei,
I want to confirm if you are using DRIVE AGX Orin Devkit platform as you marked other in Hardware Platform.
Also, I see cuDLAStandaloneMode is not part of CUDA 11.4 github samples or Devzone release. May I know from where you are using the sample. Is it CUDA 11.6 github samples(cuda-samples/Samples/4_CUDA_Libraries/cuDLAStandaloneMode/main.cpp at v11.6 · NVIDIA/cuda-samples · GitHub)?

That’s right. The example is to get the modification from here github.com@SivaRamaKrishnaNV

@SivaRamaKrishnaNV You can use your environment to compile the program I provide. The original example doesn’t run our model.

@SivaRamaKrishnaNV yes. We are this platform DRIVE AGX Orin Devkit.

Dear @haihua.wei,
How about using https://docs.nvidia.com/cuda/compute-sanitizer/index.html ?

@SivaRamaKrishnaNV There are no unusual problems with the program running. What needs to be troubled now is the CUDLA runtime memory leak. This doesn’t help us much

Dear @haihua.wei,

is 1B leak in the post corresponds to this?

You don’t see memleak messages when using compute-sanitizer?

@SivaRamaKrishnaNV
What is the environment of the board you are using?
I’m not sure if compute-sanitizer catches cudla’s memory leak. Can you use Valgrind to try it out?

Dear @haihua.wei,
The cuDLA standard API seems to have memleak issues. We informed enginnering team to fix this issue. Does this block your development?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.