Linux distro and version: Ubuntu 16.04
GPU type: TITAN Xp
nvidia driver version: 418.87.00
CUDA version: 10.1
CUDNN version: 7.6.3
TensorRT version: 6.0.1.5
After recently upgrading to TensorRT 6, we’ve been noticing memory leak warnings which didn’t appear in TensorRT 5.
Leak reports take the following form (from Valgrind’s Memcheck):
==23092== 2,408 bytes in 1 blocks are definitely lost in loss record 2,252 of 2,645
==23092== at 0x402DE03: malloc (vg_replace_malloc.c:299)
==23092== by 0x10E03D03: ??? (in /home/tom/projects/wraw/build/private/libnvinfer.so.6)
==23092== by 0x1059B0C7: nvinfer1::rt::cuda::WinogradConvActRunner::updateConvolution(dit::Convolution*, nvinfer1::rt::CommonContext const&, signed char const*, nvinfer1::utils::TensorLayout const&, nvinfer1::utils::TensorLayout const&) const (in /home/tom/projects/wraw/build/private/libnvinfer.so.6)
==23092== by 0x1059B26A: nvinfer1::rt::cuda::WinogradConvActRunner::recomputeResources(nvinfer1::rt::CommonContext const&) (in /home/tom/projects/wraw/build/private/libnvinfer.so.6)
==23092== by 0x1071E509: nvinfer1::rt::SafeEngine::initialize(nvinfer1::rt::CommonContext&, std::vector<nvinfer1::rt::EngineLayerAttribute, std::allocator<nvinfer1::rt::EngineLayerAttribute> > const&) (in /home/tom/projects/wraw/build/private/libnvinfer.so.6)
==23092== by 0x10535688: nvinfer1::rt::Engine::initialize(std::vector<nvinfer1::rt::EngineLayerAttribute, std::allocator<nvinfer1::rt::EngineLayerAttribute> > const&) (in /home/tom/projects/wraw/build/private/libnvinfer.so.6)
==23092== by 0x107093AA: ??? (in /home/tom/projects/wraw/build/private/libnvinfer.so.6)
==23092== by 0x1070ABDC: nvinfer1::builder::buildEngine(nvinfer1::NetworkBuildConfig&, nvinfer1::builder::EngineBuildContext const&, nvinfer1::Network const&) (in /home/tom/projects/wraw/build/private/libnvinfer.so.6)
==23092== by 0x105CCA2A: nvinfer1::builder::Builder::buildInternal(nvinfer1::NetworkBuildConfig&, nvinfer1::builder::EngineBuildContext const&, nvinfer1::Network const&) (in /home/tom/projects/wraw/build/private/libnvinfer.so.6)
==23092== by 0x105CD909: nvinfer1::builder::Builder::buildEngineWithConfig(nvinfer1::INetworkDefinition&, nvinfer1::IBuilderConfig&) (in /home/tom/projects/wraw/build/private/libnvinfer.so.6)
We’re careful to destroy all TensorRT objects (by wrapping in smart pointers which call destroy()
when they go out of scope). In particular, I’m sure that all of the TensorRT objects shown above (the builder, the builderConfig, the networkDefinition, and the engine) have all had destroy
called on them. We also use Plugin layers, but the leaked alloc shown above points to TensorRT and not our code (we never use raw malloc
anyway, and this was from a debug build). Who is responsible for freeing the above memory?
I’m attempting to create a minimal repro for you, but it could be tricky (we use the network builder interface, use Plugin layers, wrap TensorRT quite a bit, and have proprietary models).
Regards,
Tom Peters