Memory Leak when using Virtual Memory API (cuMemImportFromShareableHandle)

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.8.1
DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.3.10904
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

I have two applications that use the Virtual Memory API to perform CUDA IPC, in a similar manner to the memMapIPCDrv sample.
When testing them on an X86 Ubuntu 20.04 host (Nvidia T4 GPU), everything works fine and the memory consumption is steady.
When testing them on the Nvidia Orin (both Jetson and Drive), the memory utilized keeps increasing steadily. I traced the problem to the cuMemImportFromShareableHandle function.
Whenever this function executes, memory is allocated and is never released, despite calling all the functions that should release the handle and the allocation handle, unmapping the memory, etc.
Any ideas on what is wrong?

Dear @bogdan.matei,
Could you share repro code to test on DRIVE?

demo.zip (7.3 KB)
Sure! Please see attached.
You need to run the server first, then the client.

Hi @bogdan.matei,
I see below issue when try to build using cmake

nvidia@tegra-ubuntu:~/cuda_ipc/build$ cmake ..
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at server/CMakeLists.txt:8 (find_package):
  By not providing "FindCUDAToolkit.cmake" in CMAKE_MODULE_PATH this project
  has asked CMake to find a package configuration file provided by
  "CUDAToolkit", but CMake did not find one.

  Could not find a package configuration file provided by "CUDAToolkit" with
  any of the following names:

    CUDAToolkitConfig.cmake
    cudatoolkit-config.cmake

  Add the installation prefix of "CUDAToolkit" to CMAKE_PREFIX_PATH or set
  "CUDAToolkit_DIR" to a directory containing one of the above files.  If
  "CUDAToolkit" provides a separate development package or SDK, be sure it
  has been installed.


-- Configuring incomplete, errors occurred!
See also "/home/nvidia/cuda_ipc/build/CMakeFiles/CMakeOutput.log".

Could you make it a like CUDA sample which can be put under /usr/local/cuda/samples/0_Simple/memMapIPCDrv on target and build using make.

demo.zip (12.4 KB)

Hi @SivaRamaKrishnaNV,

Attached is the Makefile version of the demo. Extract it to cuda-samples/Samples/0_Simple and compile with make.

does that mean used memory is not increasing like below on x86?

nvidia@tegra-ubuntu:/usr/local/cuda/samples/0_Simple/test/server$ ./server
used memory (MB): 2535
used memory (MB): 2535
used memory (MB): 2535
used memory (MB): 2535
used memory (MB): 2535
used memory (MB): 2535
used memory (MB): 2535
used memory (MB): 2535
used memory (MB): 2536
used memory (MB): 2536
used memory (MB): 2536
used memory (MB): 2536
used memory (MB): 2536
used memory (MB): 2536
used memory (MB): 2536
used memory (MB): 2536
used memory (MB): 2536
used memory (MB): 2537
used memory (MB): 2561

Indeed. Here is the output from x86:

used memory (MB): 228
used memory (MB): 228
used memory (MB): 228
used memory (MB): 228
used memory (MB): 228
used memory (MB): 228
used memory (MB): 228
used memory (MB): 228

I can repro the issue on x86 and DRIVE. I will check with engineering team and update you. Thanks.
May I know if this issue blocks your development or you just experimenting it and noticed it?

It is a blocker for our development. We are implementing a solution that includes these operations.