TensorRT 6 memory leak

I recently flashed a Jetson Xavier 32Gb with JetPack 4.3 to be up to date with the new version of TRT.
However, running my project’s tests I came across a “GPU out of memory error”.
As I have been careful to use smart pointers to avoid leakage (nonetheless I checked in the code base for possible leaks), I was able to pin point the problem to TRT6.

Here is a minimal example of a code that leaks memory even after the destructor of the smart pointers called the destroy() method from nvinfer1::IRuntime and nvinfer1::IExecutionContext

#include <iostream>
#include <NvOnnxParser.h>
#include <memory>
#include <experimental/filesystem>
#include <fstream>

template<typename T>
struct NvInferDestroyDelete {
    void operator()(T* t) {
        std::cout << "Destroying object" << std::endl;
        t->destroy();
    }
};

template<typename T>
using NvInferUniquePtr = std::unique_ptr<T, NvInferDestroyDelete<T>>;

class TRTLogger : public nvinfer1::ILogger {
public:
  void log(Severity severity, const char* msg) override
  {
      static const std::array<const char*, 5> type{
          {"Internal Error", "Error", "Warning", "Info", "Verbose"}};
      std::cout << '[' << type.at(static_cast<size_t>(severity)) << "] "
                << msg << '\n';
  }
};

std::streamoff stream_size(std::istream& f)
{
    std::istream::pos_type current_pos = f.tellg();
    if (1 == current_pos) {
        return 1;
    }

    f.seekg(0, std::istream::end);
    std::istream::pos_type end_pos = f.tellg();
    f.seekg(current_pos);
    return end_pos - current_pos;
}

bool stream_read_string(std::istream& f, std::string& result)
{
    std::streamoff len = stream_size(f);
    if (len == 1) {
        return false;
    }
    result.resize(static_cast<std::string::size_type>(len));
    f.read(&result[0], len);
    return true;
}

std::string read_file(const std::experimental::filesystem::path& path)
{
    std::ifstream file(path, std::ios::binary);
    // disable skipping new lines in binary mode
    file.unsetf(std::ios::skipws);
    std::string result;
    if (!stream_read_string(file, result)) {
        throw std::runtime_error("Failed to read file");
    }

    return result;
}

auto get_gpuinfo()
{
    size_t free_byte ;
    size_t total_byte ;

    cudaMemGetInfo( &free_byte, &total_byte ) ;

    double free_db = (double)free_byte ;
    double total_db = (double)total_byte ;
    double used_db = total_db - free_db ;

    return used_db/1024.0/1024.0;
}

int main()
{
    TRTLogger logger;

    for(auto loop = 0; loop < 1000; ++loop)
    {
        std::cout << "Memory used: " << get_gpuinfo() << std::endl;
        auto infer = NvInferUniquePtr<nvinfer1::IRuntime>(nvinfer1::createInferRuntime(logger));
        std::string model = read_file("santis.onnx.engine");
        auto engine =
            infer->deserializeCudaEngine(model.data(), model.size(), nullptr);
        if (engine == nullptr) {
            throw std::runtime_error("Could not initialize engine");
        }

        auto context = NvInferUniquePtr<nvinfer1::IExecutionContext>(
            engine->createExecutionContext());
    }
    
    return 0;
}

The compilation is done with nvcc tensorrt_leak.cu -o tensorrt_leak -lnvinfer -lstdc++fs

Ubuntu 18.04
g++ 7.4.0
nvcc Cuda compilation tools, release 10.0, V10.0.326
TensorRT 6.0.1.10-1+cuda10.0
libnvinfer6 6.0.1-1+cuda10.0
santir_xavier.onnx.zip (44.7 MB)

Hi,

Thanks for your reporting.
We are checking this issue internal and will update more information with you later.

Thanks.

Hi,

It looks like the engine parameter doesn’t be destroyed

Please check the update source below:

for(auto loop = 0; loop < 1000; ++loop)
{
    std::cout << "Memory used: " << get_gpuinfo() << std::endl;
    std::string model = read_file("santis.onnx.engine");

    auto infer = NvInferUniquePtr<nvinfer1::IRuntime>(nvinfer1::createInferRuntime(logger));
    auto engine = NvInferUniquePtr<nvinfer1::ICudaEngine>(
        infer->deserializeCudaEngine(model.data(), model.size(), nullptr));

    if (engine == nullptr) {
        throw std::runtime_error("Could not initialize engine");
    }

    auto context = NvInferUniquePtr<nvinfer1::IExecutionContext>(
        engine->createExecutionContext());
}

Thanks

Hello !

Thanks for taking a look at the issue. Good catch of not freeing the engine’s memory, my mistake. After fixing that, unfortunatelly the leaking persist, here is the output of the first few iterations of the code

$ ./tensorrt_leak 
Memory used: 5498.18
[Verbose] Deserialize required 2519148 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 6684.97
[Verbose] Deserialize required 205900 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 6688.71
[Verbose] Deserialize required 211395 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 6691.5
[Verbose] Deserialize required 209501 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 6694.8
[Verbose] Deserialize required 206750 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 6697.29
[Verbose] Deserialize required 204854 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 6700.66
[Verbose] Deserialize required 211915 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 6703.35
[Verbose] Deserialize required 208156 microseconds.
Destroying object
^C

Hi,

Sorry that we cannot reproduce the leakage in our environment.
Here is our output log:

nvidia@xavier:~/topic_1071670$ ./tensorrt_leak 
Memory used: 5950.75
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 2897008 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7108.58
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 267792 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7106.69
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 267709 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7108.8
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 265783 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7107.09
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 267489 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7108.93
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 266136 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7107.01
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 265711 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7109.11
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 265595 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7107.2
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 264652 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7109.81
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 265457 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7107.51
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 266235 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7109.65
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 266796 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7096.85
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263160 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7099.45
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 262384 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7097.36
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263415 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7099.61
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263103 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7097.86
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263257 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7099.91
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263045 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7098.03
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263389 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7100.22
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263714 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7097.84
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263733 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7099.93
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263279 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7098.06
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263177 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7100.17
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263284 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7098.08

Do we miss anything?

Thanks.

Hello AastaLLL,

Thank you for following up this. It is strange.

Could you tell me what is your environment? What JetPack version?
How are you compiling the code I provided?

Is it possible to be hardware related? I say this since there is a warning in your code that points “different models of devices”.

Hi,

I am using Xavier 8GB with JetPack4.3.
Let me try this on a standard Xavier and update more information with you.

The compiling command is the same as you shared in the comment#1.
Thanks.

Hi,

We can reproduce this issue on a Xavier 32G environment.
This issue is passed to our internal TensorRT team.

Will let you let once we got any feedback.
Thanks.

Hi Team
We see the same problem with Jetpack 4.3 in jetson nano / Python and opencv. Simple program which read a video stream started to consume all the memory and we are getting our of memory. Till now we tired with 4 different python and opencv versions without any luck.

Pattern

  1. Slowness in displaying the vide steam
  2. After 5 minute due to memory issues the entire nano freezes
  3. Nano is back to normal usage after the process is terminted

Thanks
Siva

Hi, infojk16g

This issue is related to the TensorRT memory leakage that different to your use case.
Would you mind to file another topic specified for your problem.

Thanks.

Hi, federico.martinez

Thanks for your patience.

We found out the memory leakage came from cuDNN library.
And it is fixed in our internally latest cuDNN version (v8.x).

Please wait for our announcement for the new package.
Thanks.

Hi,

We want to verify if this issue is fixed in our new cuDNN package.
Would you mind to share the onnx model with us so we can generate an engine file for our environment?

Thanks.