context->setBindingDimensions casing gpu memory leak

yfjiaren · October 17, 2019, 9:24am

context->setBindingDimensions

Would case gpu memory leak.

yfjiaren · October 17, 2019, 9:25am

@NVES_R

yfjiaren · October 17, 2019, 9:28am

for(int i=0; i < 1000000; ++i) {
    context->setBindingDimensions
    context->enqueueV2(buffers, stream, nullptr);
}

Would case gpu memory leak.

NVES_R · October 18, 2019, 12:55am

Hi,

Can you provide the full .cpp script for this small example?

Thanks,
NVIDIA Enterprise Support

yfjiaren · October 18, 2019, 7:07am

You could just use any *.engine file with dynamic/fixed input size, and use the following code to find that the gpu memory usage is always rising.

class MyArray {
    public:
        int inp_size{0};
        int out_size{0};
        int inp_bytes{0};
        int out_bytes{0};
        int time_step{0};
        Dims inp_dims{};
        Dims out_dims{};
        vector<float> inp;
        vector<float> out;

        void set_is_dynamic(bool is_dynamic) {
            if(is_dynamic) {
                inp = vector<float>(MAX_INP_SIZE);
                out = vector<float>(MAX_OUT_SIZE);
            }
            else {
                inp = vector<float>(FIX_INP_SIZE);
                out = vector<float>(FIX_OUT_SIZE);
            }
        }

        void setInpDims(const Dims &dims) {
            inp_size = 1;
            for(int i=0; i < dims.nbDims; ++i) {
                inp_size *= dims.d[i];
            }
            inp_dims = dims;
            inp_bytes = inp_size * sizeof(float);
        }

        void setOutDims(const Dims &dims) {
            out_size = 1;
            for(int i=0; i < dims.nbDims; ++i) {
                out_size *= dims.d[i];
            }
            out_dims = dims;
            out_bytes = out_size * sizeof(float);
        }
};

////////////////////////////////////////////////////////////////

class TrtHelper {
    public:
        TrtHelper(string engine_path, bool is_dynamic) {
            runtime = createInferRuntime(gLogger);
            assert(runtime != nullptr);

            stringstream ss;
            ss.seekg(0, ss.beg);
            ifstream cache(engine_path);
            ss << cache.rdbuf();
            cache.close();
            ss.seekg(0, std::ios::end);
            cint size = ss.tellg();
            ss.seekg(0, std::ios::beg);
            void* memory = malloc(size);
            ss.read((char*)memory, size);
            engine = runtime->deserializeCudaEngine(memory, size, nullptr);
            free(memory);
            assert(engine != nullptr);

            context = engine->createExecutionContext();
            assert(context != nullptr);
            if(is_dynamic) { context->setOptimizationProfile(0); }

            CHECK(cudaStreamCreate(&stream));
            cout << "Loaded " << engine_path << " ..." << endl;

            assert(engine->getNbBindings() == 2);
            inpIndex = engine->getBindingIndex(INP_NAME.c_str());
            outIndex = engine->getBindingIndex(OUT_NAME.c_str());

            auto sf = sizeof(float);
            if(is_dynamic) {
                CHECK(cudaMalloc(&buffers[inpIndex], MAX_INP_SIZE*sf));
                CHECK(cudaMalloc(&buffers[outIndex], MAX_OUT_SIZE*sf));
            }
            else {
                CHECK(cudaMalloc(&buffers[inpIndex], FIX_INP_SIZE*sf));
                CHECK(cudaMalloc(&buffers[outIndex], FIX_OUT_SIZE*sf));
            }

            cout << "Created cuda buffers ..." << endl;
        }

        ~TrtHelper() {
            if(context) { context->destroy(); }
            if(engine) { engine->destroy(); }
            if(runtime) { runtime->destroy(); }
            if(stream) { cudaStreamDestroy(stream); }

            if(buffers[0]) { CHECK(cudaFree(buffers[inpIndex])); buffers[0] = nullptr;}
            if(buffers[1]) { CHECK(cudaFree(buffers[outIndex])); buffers[1] = nullptr;}
        }

        void inference(MyArray &myArray) {
            context->setBindingDimensions(inpIndex, myArray.inp_dims);
            myArray.setOutDims(context->getBindingDimensions(outIndex));

            CHECK(cudaMemcpyAsync(buffers[inpIndex], myArray.inp.data(),
                        myArray.inp_bytes, cudaMemcpyHostToDevice, stream));
cout << "Memory debug beg ..." << endl;
for(int i=0; i < 1000000000; ++i) {
context->setBindingDimensions(inpIndex, myArray.inp_dims);
            context->enqueueV2(buffers, stream, nullptr);
cudaStreamSynchronize(stream);
}
cout << "Memory debug end ..." << endl;
            CHECK(cudaMemcpyAsync(myArray.out.data(), buffers[outIndex],
                        myArray.out_bytes, cudaMemcpyDeviceToHost, stream));
            cudaStreamSynchronize(stream);
        }

    private:
        int inpIndex{0};
        int outIndex{0};
        void* buffers[2]{};

        cudaStream_t stream{nullptr};
        IRuntime *runtime{nullptr};
        ICudaEngine *engine{nullptr};
        IExecutionContext *context{nullptr};
};

ImgHelper img(BLOCK_W, max_inp_w, FIX_INP_H, PAD_VALUE);
        for(auto fip: load_files(input_path)) {
            // cout << ip << endl;
            if(!img.load_img(fip, myArray, is_dynamic)) {
                continue;
            }

            trt.inference(myArray);
            string text = ctc.greedy_ctc(myArray);
            cout << "The text of file '" << fip << "' is: " << text << endl;
        }

I could not provide the whole code, but I think the above code is enough as an example => You could simply take !img.load_img(fip, myArray, is_dynamic) as setting image data to myArray.inp.

yfjiaren · October 18, 2019, 7:15am

Case 1:

for(int i=0; i < 1000000; ++i) {
    context->setBindingDimensions
    context->enqueueV2(buffers, stream, nullptr);
}

Case 2:

for(int i=0; i < 1000000; ++i) {
    context->enqueueV2(buffers, stream, nullptr);
}

Case 3:

for(int i=0; i < 1000000; ++i) {
    context->setBindingDimensions
}

Only case 1 would cause memory usage to raise, and case 2 and case 3 is ok.

NVES_R · November 1, 2019, 8:32pm

Just got an answer, they caught this in a similar bug and have fixed it for the next release. Thanks for pointing this out.

yfjiaren · December 3, 2019, 7:36am

Is the fixed version released yet ?

NVES_R · December 3, 2019, 9:58pm

Hi yfjiaren,

Not yet. Sorry, but I can’t release the timeline.

nilshinn · December 23, 2019, 12:51pm

Gpu memory leak is fixed on TensorRT7,
but, slow performance is Same as before.

NVES_R · December 23, 2019, 9:12pm

Hi yfjiaren,

TRT7 has been released and should fix the memory leak issue.

flazerain · December 24, 2019, 2:48am

I found tensorRT7 also has this problem.

I use python api and every time I call context.set_binding_shape and context.execute_async_v2, the gpu memory grows utill out of memory.

But it didn’t happen to one of my other simple network.

nilshinn · December 24, 2019, 7:18am

Gpu memory leak is fixed on TensorRT7,
but, slow performance is Same as before.
I use P100

echosmask · December 26, 2019, 1:56am

@NVES_R, @yfjiaren, @nilshinn

Indeed, ResNet-like heavy networks built by TRT python api do make GPU memory leak each time when I call context.set_binding_shape and context.execute_async_v2. Thus, TensorRT7.0 still has this problem. Besides, the environment is as follows:

Tesla P4
CUDA 10.0.130
TensorRT 7.0.0.11

Update

Tesla P4 do memory leak. RTX 2080TI do not leak.

flazerain · January 2, 2020, 7:29am

CUDA 10.0.130
TensorRT 7.0.0.11
gtx 1060
retinanet-resnet50

update:
V100 do not leak but is slow(same as pytorch)

NVES_R · January 2, 2020, 5:30pm

Thanks for the updates everyone, looking into this.

NVES_R · February 14, 2020, 12:33am

This issue has been fixed upstream and should be included in the next release.

fran6co · March 10, 2020, 9:03am

Could you release a hotfix? This is blocking production code for us. Thank you

patrick.beaulieu · April 1, 2020, 6:43pm

@NVES_R
When will this fix be released? Can a trt 7 hotfix be released prior to trt8?

We are a big customer unable to upgrade from older tensor-rt in released products due to this issue causing application instability.

pierre.massat · April 1, 2020, 6:46pm

@NVES_R I would love to see that fix release very soon too, we’re having this problem and it’s blocking us to upgrade to TRT7.