Int8 Calibration for semantic segmentation crash

Dear NV Team,
I followed the example from https://devblogs.nvidia.com/int8-inference-autonomous-vehicles-tensorrt/ and was able to run the int8 semantic Segmentation with 20 classes on my Workstation with GTX 1080 TI and on the DPX2 AutoChauffeur. The configuration of the DPX2 after flashing with DriveInstall_5.0.5.0bL_SDK_b3 is following:

  • Ubuntu 16.04 LTS
  • Cuda 9.0
  • CUDNN 7.0.4.31
  • TensorRT 3.0.2
    Know I decide to change the number of semantic classes to improve the run tine and set it to 6, since I was not able to reach the 50ms runtime as described in https://devblogs.nvidia.com/int8-inference-autonomous-vehicles-tensorrt/. I was able to train and run my semantic segmentation with 6 classes in Caffe with GTX 1080 TI. But the Int8 TensorRT Engine just crash while building the Engine (ICudaEngine* engine = builder->buildCudaEngine(*network)) on the DPX2 AutoChauffeur. I’m able to save the calibration table but after that I get the following error: “free(): invalid next size (fast)” and the system crash.
    I know use the tool called Valgrind (see http://valgrind.org/ ) to monitor the memory while building the engine on the DPX2 AutoChauffeur. The log file show me the following

==932== Invalid write of size 8
==932== at 0x4C21048: nvinfer1::task::caskLayer::computeEltwiseScales(std::vector<float, std::allocator >&) const (in /usr/lib/aarch64-linux-gnu/libnvinfer.so.4.0.2)
==932== by 0x4C2B0BF: nvinfer1::task::caskConvolutionLayer::allocateResources(nvinfer1::cudnn::CommonContext const&) (in /usr/lib/aarch64-linux-gnu/libnvinfer.so.4.0.2)
==932== by 0x4C442AF: nvinfer1::cudnn::selectFastestLayerAndDeleteOthers(nvinfer1::cudnn::EngineBuildContext&, std::vector<nvinfer1::cudnn::Layer*, std::allocatornvinfer1::cudnn::Layer* > const&) (in /usr/lib/aarch64-linux-gnu/libnvinfer.so.4.0.2)
==932== by 0x4C3935B: nvinfer1::builder::buildSingleLayer(nvinfer1::cudnn::EngineBuildContext&, nvinfer1::builder::Node&, std::unordered_map<std::string, std::unique_ptr<nvinfer1::cudnn::Region, std::default_deletenvinfer1::cudnn::Region >, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, std::unique_ptr<nvinfer1::cudnn::Region, std::default_deletenvinfer1::cudnn::Region > > > > const&, nvinfer1::CpuMemoryGroup&, std::unordered_map<std::string, std::vector<float, std::allocator >, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, std::vector<float, std::allocator > > > >, bool) (in /usr/lib/aarch64-linux-gnu/libnvinfer.so.4.0.2)
==932== by 0x4C3B67F: nvinfer1::builder::EngineTacticSupply::getBestTactic(nvinfer1::builder::Node&, nvinfer1::query::Portsnvinfer1::RegionFormatL const&, bool) (in /usr/lib/aarch64-linux-gnu/libnvinfer.so.4.0.2)
==932== by 0x4C0AF07: nvinfer1::builder::(anonymous namespace)::LeafCNode::computeCosts(nvinfer1::builder::TacticSupply&, std::unordered_map<std::string, std::vector<float, std::allocator >, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, std::vector<float, std::allocator > > > >
) (in /usr/lib/aarch64-linux-gnu/libnvinfer.so.4.0.2)
==932== by 0x4C0F54F: nvinfer1::builder::chooseFormatsAndTactics(nvinfer1::builder::Graph&, nvinfer1::builder::TacticSupply&, std::unordered_map<std::string, std::vector<float, std::allocator >, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, std::vector<float, std::allocator > > > >) (in /usr/lib/aarch64-linux-gnu/libnvinfer.so.4.0.2)
==932== by 0x4C3C477: nvinfer1::builder::makeEngineFromGraph(nvinfer1::CudaEngineBuildConfig const&, nvinfer1::cudnn::HardwareContext const&, nvinfer1::builder::Graph&, std::unordered_map<std::string, std::vector<float, std::allocator >, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, std::vector<float, std::allocator > > > >
, int) (in /usr/lib/aarch64-linux-gnu/libnvinfer.so.4.0.2)
==932== by 0x4C3DE3F: nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, nvinfer1::cudnn::HardwareContext const&, nvinfer1::Network const&) (in /usr/lib/aarch64-linux-gnu/libnvinfer.so.4.0.2)
==932== by 0x4BC9C07: nvinfer1::Builder::buildCudaEngine(nvinfer1::INetworkDefinition&) (in /usr/lib/aarch64-linux-gnu/libnvinfer.so.4.0.2)
==932== by 0x40BDAF: caffeToGIEModel_Calib(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, unsigned int, nvinfer1::IInt8Calibrator*, nvinfer1::IHostMemory**, std::ostream&) (sampleFCN_PeP.cpp:502)
==932== by 0x40E2F7: startSemSeg() (sampleFCN_PeP.cpp:1249)
==932== by 0x40F8A7: main (sampleFCN_PeP.cpp:1658)
==932== Address 0x67f8a908 is 0 bytes after a block of size 24 alloc’d
==932== at 0x4845108: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)

Can you please tell me what to do to solve this problem.
Thank you in advance.
Best Regards

Hello,

We are recommending our DPX2 customers to upgrade to the latest SDK on DRIVE PX2, which was posted to developer site this week.


DRIVE OS 5.0.10.3 Linux SDK for DRIVE PX 2
TensorRT 4.0.0.8RC

This is the final SDK for DRIVE PX2, and we’re planning to end the support this year.