Memory Leak when Using nvJitLinkAddData/nvJitLinkAddFile in CUDA JIT Compilation

Hi, I am working with CUDA JIT compilation using nvrtc and nvJitLink. I am encountering a memory leak when calling the nvJitLinkAddData/nvJitLinkAddFile function during the linking process.

Here is my most simplified code:

#include <nvrtc.h>
#include <cuda.h>
#include <nvJitLink.h>
#include <nvrtc.h>
#include <iostream>

#define NVRTC_SAFE_CALL(x)                                    \
    do                                                        \
    {                                                         \
        nvrtcResult result = x;                               \
        if (result != NVRTC_SUCCESS)                          \
        {                                                     \
            std::cerr << "\nerror: " #x " failed with error " \
                      << nvrtcGetErrorString(result) << '\n'; \
            exit(1);                                          \
        }                                                     \
    } while (0)
#define CUDA_SAFE_CALL(x)                                     \
    do                                                        \
    {                                                         \
        CUresult result = x;                                  \
        if (result != CUDA_SUCCESS)                           \
        {                                                     \
            const char *msg;                                  \
            cuGetErrorName(result, &msg);                     \
            std::cerr << "\nerror: " #x " failed with error " \
                      << msg << '\n';                         \
            exit(1);                                          \
        }                                                     \
    } while (0)
#define NVJITLINK_SAFE_CALL(h, x)                             \
    do                                                        \
    {                                                         \
        nvJitLinkResult result = x;                           \
        if (result != NVJITLINK_SUCCESS)                      \
        {                                                     \
            std::cerr << "\nerror: " #x " failed with error " \
                      << result << '\n';                      \
            size_t lsize;                                     \
            result = nvJitLinkGetErrorLogSize(h, &lsize);     \
            if (result == NVJITLINK_SUCCESS && lsize > 0)     \
            {                                                 \
                char *log = (char *)malloc(lsize);            \
                result = nvJitLinkGetErrorLog(h, log);        \
                if (result == NVJITLINK_SUCCESS)              \
                {                                             \
                    std::cerr << "error: " << log << '\n';    \
                    free(log);                                \
                }                                             \
            }                                                 \
            exit(1);                                          \
        }                                                     \
    } while (0)

int main()
{

    std::string program_name = "my_program";
    std::string program_source = R"(
  template <typename T>
  __global__ void my_kernel(T* data) { *data = T{7}; }
  )";

    for (int i = 0; i < 80000; i++)
    {
        nvrtcProgram prog;
        NVRTC_SAFE_CALL(
            nvrtcCreateProgram(&prog,                  // prog
                               program_source.c_str(), // buffer
                               program_name.c_str(),   // name
                               0,                      // numHeaders
                               NULL,                   // headers
                               NULL));                 // includeNames

        nvrtcResult compileResult = nvrtcCompileProgram(prog,  // prog
                                                        0,     // numOptions
                                                        NULL); // options
                                                               // Obtain compilation log from the program.

        if (compileResult != NVRTC_SUCCESS)
        {
            exit(1);
        }

        nvJitLinkHandle handle;
        const char *lopts[] = {"-arch=sm_80"};
        NVJITLINK_SAFE_CALL(handle, nvJitLinkCreate(&handle, 1, lopts));
        NVJITLINK_SAFE_CALL(handle, nvJitLinkAddFile(handle, NVJITLINK_INPUT_OBJECT,
                                                     "/path/to/anyobject.o"));
        NVJITLINK_SAFE_CALL(handle, nvJitLinkComplete(handle));
        NVJITLINK_SAFE_CALL(handle, nvJitLinkUnload(handle));
        NVJITLINK_SAFE_CALL(handle, nvJitLinkDestroy(&handle));
        NVRTC_SAFE_CALL(nvrtcDestroyProgram(&prog));
        // printf("i: %d\n", i);
    }
}

The issue is that when I call nvJitLinkAddFile, the memory usage continuously increases, and a memory leak occurs. However, when I comment out the line:

// NVJITLINK_SAFE_CALL(handle, nvJitLinkAddFile(handle, NVJITLINK_INPUT_OBJECT, "/path/to/anyobject.o"));

The memory leak stops, and the program runs as expected.

Compilation and Execution

I compile and run the code using the following command:

nvcc -std=c++14 ./main.cc -lnvrtc -lnvJitLink -ldl && ./a.out

Has anyone encountered similar memory leaks with nvJitLinkAddFile? I’ve simplified the code as much as possible, but the issue persists. Any help or insights would be greatly appreciated!

Thank you in advance!

at least up through CUDA 12.6.2 this:

doesn’t appear to be a part of CUDA nvJitLink library. I also don’t find it documented in the current docs.

  1. are you referring to host memory leaking or device memory leaking?
  2. Which CUDA version are you using?

When I run the code you have posted using CUDA 12.2, commenting out the nvJitLinkUnload line, and monitor host memory usage in top, it never proceeds above 0.2% of host memory for the application usage. That is, over the course of several minutes, I see no increase in host memory usage that would be characteristic of a host memory leak.

(With respect to device memory usage, the posted code does nothing with any CUDA GPU, so there is no reported processes using any GPU, nor any device memory usage attributed to the process.)

1 Like

Minor possible leak (probably unrelated): free should be outside the inner if as malloc is outside.

1 Like
  1. Sorry, please ignore that line.
  2. After many tests, I found that once the object file contains constants modified by __constant__, this problem will occur. (The value of the RES column in the top command keeps increasing)
  3. cuda 12.4

I have created a repository to reproduce the issue. The memory usage keeps growing regardless of whether the object file contains constants declared with the __constant__ qualifier.

@Robert_Crovella Could you kindly help me take a look?

I have created a repository to reproduce the issue.

I suggest the following:

  1. retest on the latest available CUDA version, currently 12.6.3
  2. if the problem persists, file a bug.

This maps to NVBUG ID 4993750 . We are looking and interacting with requester . Will bring back conclusion here .

1 Like

We are glad to let you know the memory leak is fixed and verified internally on your provided case . The fix will target a next second CUDA release . To facilitate you at time , it looks like compiling the kernel object without -rdc=true can workaround the leak .
Thanks for reporting bugs to us .

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.