TensorRt on windows error when loading in deserialized custom plugins

Description

I am trying to port a tensorrt based interference library with custom plugins from Linux to windows ,I am able to successfully build the tensorrt engine in int8 and fp32 formats, but when i try to deserialize and run the engine I run into a memory bug that I am not able to figure out why its happening

    pluginFactory = new PluginFactory();
    runtimeRT = createInferRuntime(loggerRT);
    engineRT = runtimeRT->deserializeCudaEngine(gieModelStream, size, (IPluginFactory *) pluginFactory);

gieModelStream is the serialized engine which has been loaded from the binary file ,then i pass this into

IPlugin* PluginFactory::createPlugin(const char* layerName, const void* serialData, size_t serialLength) {
    const char * buf = reinterpret_cast<const char*>(serialData);
r->w = readBUF<int>(buf);
        for(int i=0; i<r->n_masks; i++)
            r->mask[i] = readBUF<dnnType>(buf);//error occurs in this line
        for(int i=0; i<r->n_masks*2*r->num; i++)
            r->bias[i] = readBUF<dnnType>(buf);

The error statement points at buf and reads unable to access memory, and the value of i till it reaches the exception never remains the same, I tried running my modified code on Linux, it works perfectly but on windows i go into the above mentioned error irrespective of what type of model i chose(as long as it has this specific layer),can someone please help me understand what’s going on in this error?

Environment

TensorRT Version: 7.2.3.4/7.2.2.3
GPU Type: GTX 1070 Max-Q
Nvidia Driver Version:
CUDA Version: 11.1
CUDNN Version: 8.1.1/8.0.5
Operating System + Version: Windows 10 Pro 20H2
Python Version (if applicable): None
TensorFlow Version (if applicable): None
PyTorch Version (if applicable): None
Baremetal or Container (if container which image + tag): None

Hi,
Please refer to below links related custom plugin implementation and sample:

Thanks!

Yeah i did use those as my reference ,but i really cant find out what’s wrong with it cause when i try to run a memory check of it in linux it shows up no show memory error.On windows the error shows up whenever i want to load in the custom plugins from the deserialized engine and that fact that the error pops up at different values of “i” makes it even more weirder.raw pointers have been mostly used in my code ,will shifting them to smart pointers remove this error ? and I would also like to know what fundamentally causes this type of an error to occur

Hi @harshvardhanchandira,

Could you please share us more details of error (logs) and issue reproducible scripts for better debugging.

Thank you.

This is the log from running it with address sanitizer enabled on windows

detection
yolo4tiny_fp32.rt
New NetworkRT (TensorRT v7.23)
Float16 support: 0
Int8 support: 1
DLAs: 0
=================================================================
==12884==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x12494e6c7288 at pc 0x7ffe44e7ca55 bp 0x0052a71d0920 sp 0x0052a71d0928
READ of size 4 at 0x12494e6c7288 thread T0
    #0 0x7ffe44e7ca54 in tk::dnn::readBUF<float>(char const *&) C:\Users\perseusdg\Development\tkdnn-windows\include\tkDNN\NetworkRT.h:20
    #1 0x7ffe44e5de88 in tk::dnn::PluginFactory::createPlugin(char const *, void const *, unsigned __int64) C:\Users\perseusdg\Development\tkdnn-windows\src\NetworkRT.cpp:753
    #2 0x7ffe0e9fd811 in nvinfer1::utils::transposeSubBuffers(void *, enum nvinfer1::DataType, int, int, int) (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x18028d811)
    #3 0x7ffe0e9f68fd in nvinfer1::utils::transposeSubBuffers(void *, enum nvinfer1::DataType, int, int, int) (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x1802868fd)
    #4 0x7ffe0e9f7ee7 in nvinfer1::utils::transposeSubBuffers(void *, enum nvinfer1::DataType, int, int, int) (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x180287ee7)
    #5 0x7ffe0ea5a383 in nvinfer1::setAllocationTracking(struct nvinfer1::CpuGpuPair<bool>) (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x1802ea383)
    #6 0x7ffe0ea074fa in nvinfer1::decodeContextDims(class nvinfer1::Dims &, class nvinfer1::IExecutionContext const *) (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x1802974fa)
    #7 0x7ffe0ea17d2d in nvinfer1::setRuntimeProfiler(class nvinfer1::IExecutionContext &, class nvinfer1::IRuntimeProfiler *) (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x1802a7d2d)
    #8 0x7ffe44e78bd9 in tk::dnn::NetworkRT::deserialize(char const *) C:\Users\perseusdg\Development\tkdnn-windows\src\NetworkRT.cpp:632
    #9 0x7ffe44e645b0 in tk::dnn::NetworkRT::NetworkRT(class tk::dnn::Network *, char const *) C:\Users\perseusdg\Development\tkdnn-windows\src\NetworkRT.cpp:151
    #10 0x7ffe44ec3b23 in tk::dnn::Yolo3Detection::init(class std::basic_string<char, struct std::char_traits<char>, class std::allocator<char>> const &, int, int, float) C:\Users\perseusdg\Development\tkdnn-windows\src\Yolo3Detection.cpp:10
    #11 0x7ff662c72bac in main C:\Users\perseusdg\Development\tkdnn-windows\demo\demo\demo.cpp:75
    #12 0x7ff662c96f18 in invoke_main D:\a01\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
    #13 0x7ff662c96e6d in __scrt_common_main_seh D:\a01\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
    #14 0x7ff662c96d2d in __scrt_common_main D:\a01\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:330
    #15 0x7ff662c96f8d in mainCRTStartup D:\a01\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:16
    #16 0x7ffead9b54ed  (C:\WINDOWS\System32\KERNEL32.DLL+0x1800154ed)
    #17 0x7ffeae3bcd8a  (C:\WINDOWS\SYSTEM32\ntdll.dll+0x18007cd8a)

0x12494e6c728b is located 0 bytes to the right of 20619-byte region [0x12494e6c2200,0x12494e6c728b)
allocated by thread T0 here:
    #0 0x7ffe452ba3d8  (C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\bin\Hostx64\x64\clang_rt.asan_dbg_dynamic-x86_64.dll+0x18004a3d8)
    #1 0x7ffe0fa60d17 in cask_trt::WeightGradientShader::isNhwcOutput(void) const (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x1812f0d17)
    #2 0x7ffe0f552926 in cask_trt::SafeEnum<struct cask_trt::GemmImpl::Traits_ENUMCLASS_SCOPEWRAPPER>::SafeEnum<struct cask_trt::GemmImpl::Traits_ENUMCLASS_SCOPEWRAPPER>(enum cask_trt::GemmImpl::Traits_ENUMCLASS_SCOPEWRAPPER::Label) (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x180de2926)
    #3 0x7ffe0e82a1ce  (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x1800ba1ce)
    #4 0x7ffe0e9e2d03 in nvinfer1::utils::transposeSubBuffers(void *, enum nvinfer1::DataType, int, int, int) (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x180272d03)
    #5 0x7ffe0e9fd68b in nvinfer1::utils::transposeSubBuffers(void *, enum nvinfer1::DataType, int, int, int) (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x18028d68b)
    #6 0x7ffe0e9f68fd in nvinfer1::utils::transposeSubBuffers(void *, enum nvinfer1::DataType, int, int, int) (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x1802868fd)
    #7 0x7ffe0e9f7ee7 in nvinfer1::utils::transposeSubBuffers(void *, enum nvinfer1::DataType, int, int, int) (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x180287ee7)
    #8 0x7ffe0ea5a383 in nvinfer1::setAllocationTracking(struct nvinfer1::CpuGpuPair<bool>) (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x1802ea383)
    #9 0x7ffe0ea074fa in nvinfer1::decodeContextDims(class nvinfer1::Dims &, class nvinfer1::IExecutionContext const *) (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x1802974fa)
    #10 0x7ffe0ea17d2d in nvinfer1::setRuntimeProfiler(class nvinfer1::IExecutionContext &, class nvinfer1::IRuntimeProfiler *) (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvinfer.dll+0x1802a7d2d)
    #11 0x7ffe44e78bd9 in tk::dnn::NetworkRT::deserialize(char const *) C:\Users\perseusdg\Development\tkdnn-windows\src\NetworkRT.cpp:632
    #12 0x7ffe44e645b0 in tk::dnn::NetworkRT::NetworkRT(class tk::dnn::Network *, char const *) C:\Users\perseusdg\Development\tkdnn-windows\src\NetworkRT.cpp:151
    #13 0x7ffe44ec3b23 in tk::dnn::Yolo3Detection::init(class std::basic_string<char, struct std::char_traits<char>, class std::allocator<char>> const &, int, int, float) C:\Users\perseusdg\Development\tkdnn-windows\src\Yolo3Detection.cpp:10
    #14 0x7ff662c72bac in main C:\Users\perseusdg\Development\tkdnn-windows\demo\demo\demo.cpp:75
    #15 0x7ff662c96f18 in invoke_main D:\a01\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
    #16 0x7ff662c96e6d in __scrt_common_main_seh D:\a01\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
    #17 0x7ff662c96d2d in __scrt_common_main D:\a01\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:330
    #18 0x7ff662c96f8d in mainCRTStartup D:\a01\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:16
    #19 0x7ffead9b54ed  (C:\WINDOWS\System32\KERNEL32.DLL+0x1800154ed)
    #20 0x7ffeae3bcd8a  (C:\WINDOWS\SYSTEM32\ntdll.dll+0x18007cd8a)

SUMMARY: AddressSanitizer: heap-buffer-overflow C:\Users\perseusdg\Development\tkdnn-windows\include\tkDNN\NetworkRT.h:20 in tk::dnn::readBUF<float>(char const *&)
Shadow bytes around the buggy address:
  0x043e78058e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x043e78058e10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x043e78058e20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x043e78058e30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x043e78058e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x043e78058e50: 00[03]fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x043e78058e60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x043e78058e70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x043e78058e80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x043e78058e90: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x043e78058ea0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
AddressSanitizer: nested bug in the same thread, aborting.

This is how buffers are being read from the PluginFactory

IPlugin* PluginFactory::createPlugin(const char* layerName, const void* serialData, size_t serialLength) {
const char * buf = reinterpret_cast<const char*>(serialData);

The part where the error occurs within the PluginFactory function

 if(name.find("Yolo") == 0) {
    YoloRT *r = new YoloRT(readBUF<int>(buf),    //classes
                            readBUF<int>(buf),   //num
                            nullptr, //yolo
                            readBUF<int>(buf), //n_masks
                            readBUF<float>(buf), //scale_xy
                            readBUF<float>(buf),  //nms_thresh 
                            readBUF<int>(buf),  //nms_kind
                            readBUF<int>(buf)  //new_coords
                            );   
    r->c = readBUF<int>(buf);
    r->h = readBUF<int>(buf);
    r->w = readBUF<int>(buf);
    for(int i=0; i<r->n_masks; i++)
        r->mask[i] = readBUF<dnnType>(buf);//the error occurs here when trying to read buf ,the value of i at which it occurs isnt the same always 
    for(int i=0; i<r->n_masks*2*r->num; i++)
        r->bias[i] = readBUF<dnnType>(buf);

The above works perfectly well in linux , the engine builds fine on windows but if try to deserialize it and try to use it the issue pops up

Hi @harshvardhanchandira,

Could you please double check plugin serialization code to make sure it is platform agnostic and working as expected.

Thank you.