TensorRT 6 memory leak

federico.martinez · February 17, 2020, 1:30pm

I recently flashed a Jetson Xavier 32Gb with JetPack 4.3 to be up to date with the new version of TRT.
However, running my project’s tests I came across a “GPU out of memory error”.
As I have been careful to use smart pointers to avoid leakage (nonetheless I checked in the code base for possible leaks), I was able to pin point the problem to TRT6.

Here is a minimal example of a code that leaks memory even after the destructor of the smart pointers called the destroy() method from nvinfer1::IRuntime and nvinfer1::IExecutionContext

#include <iostream>
#include <NvOnnxParser.h>
#include <memory>
#include <experimental/filesystem>
#include <fstream>

template<typename T>
struct NvInferDestroyDelete {
    void operator()(T* t) {
        std::cout << "Destroying object" << std::endl;
        t->destroy();
    }
};

template<typename T>
using NvInferUniquePtr = std::unique_ptr<T, NvInferDestroyDelete<T>>;

class TRTLogger : public nvinfer1::ILogger {
public:
  void log(Severity severity, const char* msg) override
  {
      static const std::array<const char*, 5> type{
          {"Internal Error", "Error", "Warning", "Info", "Verbose"}};
      std::cout << '[' << type.at(static_cast<size_t>(severity)) << "] "
                << msg << '\n';
  }
};

std::streamoff stream_size(std::istream& f)
{
    std::istream::pos_type current_pos = f.tellg();
    if (1 == current_pos) {
        return 1;
    }

    f.seekg(0, std::istream::end);
    std::istream::pos_type end_pos = f.tellg();
    f.seekg(current_pos);
    return end_pos - current_pos;
}

bool stream_read_string(std::istream& f, std::string& result)
{
    std::streamoff len = stream_size(f);
    if (len == 1) {
        return false;
    }
    result.resize(static_cast<std::string::size_type>(len));
    f.read(&result[0], len);
    return true;
}

std::string read_file(const std::experimental::filesystem::path& path)
{
    std::ifstream file(path, std::ios::binary);
    // disable skipping new lines in binary mode
    file.unsetf(std::ios::skipws);
    std::string result;
    if (!stream_read_string(file, result)) {
        throw std::runtime_error("Failed to read file");
    }

    return result;
}

auto get_gpuinfo()
{
    size_t free_byte ;
    size_t total_byte ;

    cudaMemGetInfo( &free_byte, &total_byte ) ;

    double free_db = (double)free_byte ;
    double total_db = (double)total_byte ;
    double used_db = total_db - free_db ;

    return used_db/1024.0/1024.0;
}

int main()
{
    TRTLogger logger;

    for(auto loop = 0; loop < 1000; ++loop)
    {
        std::cout << "Memory used: " << get_gpuinfo() << std::endl;
        auto infer = NvInferUniquePtr<nvinfer1::IRuntime>(nvinfer1::createInferRuntime(logger));
        std::string model = read_file("santis.onnx.engine");
        auto engine =
            infer->deserializeCudaEngine(model.data(), model.size(), nullptr);
        if (engine == nullptr) {
            throw std::runtime_error("Could not initialize engine");
        }

        auto context = NvInferUniquePtr<nvinfer1::IExecutionContext>(
            engine->createExecutionContext());
    }
    
    return 0;
}

The compilation is done with nvcc tensorrt_leak.cu -o tensorrt_leak -lnvinfer -lstdc++fs

Ubuntu 18.04
g++ 7.4.0
nvcc Cuda compilation tools, release 10.0, V10.0.326
TensorRT 6.0.1.10-1+cuda10.0
libnvinfer6 6.0.1-1+cuda10.0
santir_xavier.onnx.zip (44.7 MB)

AastaLLL · February 18, 2020, 3:28am

Hi,

Thanks for your reporting.
We are checking this issue internal and will update more information with you later.

Thanks.

AastaLLL · February 18, 2020, 8:02am

Hi,

It looks like the engine parameter doesn’t be destroyed

Please check the update source below:

for(auto loop = 0; loop < 1000; ++loop)
{
    std::cout << "Memory used: " << get_gpuinfo() << std::endl;
    std::string model = read_file("santis.onnx.engine");

    auto infer = NvInferUniquePtr<nvinfer1::IRuntime>(nvinfer1::createInferRuntime(logger));
    auto engine = NvInferUniquePtr<nvinfer1::ICudaEngine>(
        infer->deserializeCudaEngine(model.data(), model.size(), nullptr));

    if (engine == nullptr) {
        throw std::runtime_error("Could not initialize engine");
    }

    auto context = NvInferUniquePtr<nvinfer1::IExecutionContext>(
        engine->createExecutionContext());
}

Thanks

federico.martinez · February 18, 2020, 9:46am

Hello !

Thanks for taking a look at the issue. Good catch of not freeing the engine’s memory, my mistake. After fixing that, unfortunatelly the leaking persist, here is the output of the first few iterations of the code

$ ./tensorrt_leak 
Memory used: 5498.18
[Verbose] Deserialize required 2519148 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 6684.97
[Verbose] Deserialize required 205900 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 6688.71
[Verbose] Deserialize required 211395 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 6691.5
[Verbose] Deserialize required 209501 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 6694.8
[Verbose] Deserialize required 206750 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 6697.29
[Verbose] Deserialize required 204854 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 6700.66
[Verbose] Deserialize required 211915 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 6703.35
[Verbose] Deserialize required 208156 microseconds.
Destroying object
^C

AastaLLL · February 19, 2020, 7:07am

Hi,

Sorry that we cannot reproduce the leakage in our environment.
Here is our output log:

nvidia@xavier:~/topic_1071670$ ./tensorrt_leak 
Memory used: 5950.75
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 2897008 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7108.58
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 267792 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7106.69
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 267709 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7108.8
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 265783 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7107.09
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 267489 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7108.93
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 266136 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7107.01
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 265711 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7109.11
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 265595 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7107.2
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 264652 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7109.81
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 265457 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7107.51
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 266235 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7109.65
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 266796 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7096.85
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263160 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7099.45
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 262384 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7097.36
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263415 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7099.61
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263103 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7097.86
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263257 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7099.91
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263045 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7098.03
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263389 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7100.22
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263714 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7097.84
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263733 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7099.93
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263279 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7098.06
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263177 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7100.17
[Warning] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[Verbose] Deserialize required 263284 microseconds.
Destroying object
Destroying object
Destroying object
Memory used: 7098.08

Do we miss anything?

Thanks.

federico.martinez · February 19, 2020, 8:09am

Hello AastaLLL,

Thank you for following up this. It is strange.

Could you tell me what is your environment? What JetPack version?
How are you compiling the code I provided?

Is it possible to be hardware related? I say this since there is a warning in your code that points “different models of devices”.

AastaLLL · February 20, 2020, 2:31am

Hi,

I am using Xavier 8GB with JetPack4.3.
Let me try this on a standard Xavier and update more information with you.

The compiling command is the same as you shared in the comment#1.
Thanks.

AastaLLL · February 20, 2020, 6:00am

Hi,

We can reproduce this issue on a Xavier 32G environment.
This issue is passed to our internal TensorRT team.

Will let you let once we got any feedback.
Thanks.

infojk16g · March 3, 2020, 1:18am

Hi Team
We see the same problem with Jetpack 4.3 in jetson nano / Python and opencv. Simple program which read a video stream started to consume all the memory and we are getting our of memory. Till now we tired with 4 different python and opencv versions without any luck.

Pattern

Slowness in displaying the vide steam
After 5 minute due to memory issues the entire nano freezes
Nano is back to normal usage after the process is terminted

Thanks
Siva

AastaLLL · March 4, 2020, 8:50am

Hi, infojk16g

This issue is related to the TensorRT memory leakage that different to your use case.
Would you mind to file another topic specified for your problem.

Thanks.

AastaLLL · March 16, 2020, 8:25am

Hi, federico.martinez

Thanks for your patience.

We found out the memory leakage came from cuDNN library.
And it is fixed in our internally latest cuDNN version (v8.x).

Please wait for our announcement for the new package.
Thanks.

AastaLLL · August 19, 2020, 2:50am

Hi,

We want to verify if this issue is fixed in our new cuDNN package.
Would you mind to share the onnx model with us so we can generate an engine file for our environment?

Thanks.

Topic		Replies	Views
TensorRT 4.0.1.6 memory leaks TensorRT	1	1489	June 4, 2019
Memory leak in IExecutionContext TRT6 TensorRT	1	1319	March 2, 2020
TensorRT 5.1.6-1+cuda10.0 jetson nano memory leakage TensorRT	2	661	November 13, 2019
TensorRT 5.1.6-1+cuda10.0 jetson nano memory leakage TensorRT	1	1091	December 30, 2019
Memory leak in TensorRT 6? TensorRT	8	1742	October 12, 2021
Valgrind detected TensorRT(7.1.3) memory leak when running trtexec on Jetson Xavier NX with JetPack 4.5.1 Jetson Xavier NX tensorrt , nvbugs	8	600	July 13, 2022
Memory leak creating nvinfer1::IRuntime on Jetson Orin Jetson Orin NX tensorrt , cudnn	6	850	November 3, 2023
TensorRT 7.0 memory leak TensorRT tensorrt	3	1520	January 28, 2021
Jetson AGX Xavier, JetPack 4.6 & TensorRT 8 issue Jetson AGX Xavier tensorrt	6	1003	December 2, 2021
Valgrind reports memory leaks on CUDA library Jetson Xavier NX tensorrt , cuda	5	1527	July 26, 2022

TensorRT 6 memory leak

Related topics