Segmentation fault on INT8 calibration of a object detection model using TF-TRT

Hi, I’m trying to perform TF-TRT INT8 optimization of a RetinaNet-like object detection model trained using tensorflow OD API. When I use FP16 TF-TRT optimization, everything works as supposed and I get good acceleration of the model.

I’m using Jetpack4.3 and I have installed precompiled version of tensorflow==1.15.2+nv20.3.

When I try to run INT8 optimization, everything seems to be ok until the moment when the INT8 calibration starts. Then, I get a SEGMENTATION FAULT (core dumped) error with the following stack trace:

GDB stack trace
0x0000007f540b1a7c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so
#1 0x0000007f4bb7634c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#2 0x0000007f4baf83e8 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#3 0x0000007f4b9e672c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#4 0x0000007f4b9e681c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#5 0x0000007f4ba946b4 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#6 0x0000007f4ba95070 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#7 0x0000007f4b9cc578 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#8 0x0000007f4b9d0854 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#9 0x0000007f4baed260 in cuMemcpyHtoDAsync_v2 () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#10 0x0000007f9b7df388 in cuMemcpyHtoDAsync_v2 () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#11 0x0000007f9b6e9670 in stream_executor::gpu::GpuDriver::AsynchronousMemcpyH2D(stream_executor::gpu::GpuContext*, unsigned long long, void const*, unsigned long long, CUstream_st*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#12 0x0000007f9b7963ec in stream_executor::Stream::ThenMemcpy(stream_executor::DeviceMemoryBase*, void const*, unsigned long long) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#13 0x0000007f9200966c in tensorflow::GPUUtil::CopyCPUTensorToGPU(tensorflow::Tensor const*, tensorflow::DeviceContext const*, tensorflow::Device*, tensorflow::Tensor*, std::function<void (tensorflow::Status const&)>, bool) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1
#14 0x0000007f9200a930 in tensorflow::GPUDeviceContext::CopyCPUTensorToDevice(tensorflow::Tensor const*, tensorflow::Device*, tensorflow::Tensor*, std::function<void (tensorflow::Status const&)>, bool) const () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1
#15 0x0000007f91ff6d54 in tensorflow::BaseGPUDevice::MaybeCopyTensorToGPU(tensorflow::AllocatorAttributes const&, tensorflow::Tensor const&, tensorflow::Tensor*, std::function<void (tensorflow::Status const&)>) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1
#16 0x0000007f91ff994c in tensorflow::BaseGPUDevice::MakeTensorFromProto(tensorflow::TensorProto const&, tensorflow::AllocatorAttributes, tensorflow::Tensor*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1
#17 0x0000007f9821ac54 in tensorflow::ConstantOp::ConstantOp(tensorflow::OpKernelConstruction*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#18 0x0000007f9821af44 in tensorflow::{lambda(tensorflow::OpKernelConstruction*)#4}::_FUN () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#19 0x0000007f91e17cfc in tensorflow::CreateOpKernel(tensorflow::DeviceType, tensorflow::DeviceBase*, tensorflow::Allocator*, tensorflow::FunctionLibraryRuntime*, tensorflow::NodeDef const&, int, tensorflow::OpKernel**) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1
#20 0x0000007f9204868c in tensorflow::CreateNonCachedKernel(tensorflow::Device*, tensorflow::FunctionLibraryRuntime*, tensorflow::NodeDef const&, int, tensorflow::OpKernel**) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1
#21 0x0000007f92061680 in tensorflow::FunctionLibraryRuntimeImpl::CreateKernel(tensorflow::NodeDef const&, tensorflow::FunctionLibraryRuntime*, tensorflow::OpKernel**) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1
#22 0x0000007f92061c74 in tensorflow::FunctionLibraryRuntimeImpl::CreateKernel(tensorflow::NodeDef const&, tensorflow::OpKernel**) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1
#23 0x0000007f97500aac in std::_Function_handler<tensorflow::Status (tensorflow::NodeDef const&, tensorflow::OpKernel**), tensorflow::DirectSession::CreateExecutors(tensorflow::CallableOptions const&, std::unique_ptr<tensorflow::DirectSession::ExecutorsAndKeys, std::default_deletetensorflow::DirectSession::ExecutorsAndKeys >, std::unique_ptr<tensorflow::DirectSession::FunctionInfo, std::default_deletetensorflow::DirectSession::FunctionInfo >, tensorflow::DirectSession::RunStateArgs*)::{lambda(tensorflow::NodeDef const&, tensorflow::OpKernel**)#1}>::_M_invoke(std::_Any_data const&, tensorflow::NodeDef const&, tensorflow::OpKernel**&&) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#24 0x0000007f92054230 in tensorflow::(anonymous namespace)::ExecutorImpl::Initialize() () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1
#25 0x0000007f9205585c in tensorflow::NewLocalExecutor(tensorflow::LocalExecutorParams const&, std::unique_ptr<tensorflow::Graph const, std::default_delete<tensorflow::Graph const> >, tensorflow::Executor**) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1
#26 0x0000007f920558e8 in tensorflow::(anonymous namespace)::DefaultExecutorRegistrar::Factory::NewExecutor(tensorflow::LocalExecutorParams const&, std::unique_ptr<tensorflow::Graph const, std::default_delete<tensorflow::Graph const> >, std::unique_ptr<tensorflow::Executor, std::default_deletetensorflow::Executor >) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1
#27 0x0000007f92055fe0 in tensorflow::NewExecutor(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, tensorflow::LocalExecutorParams const&, std::unique_ptr<tensorflow::Graph const, std::default_delete<tensorflow::Graph const> >, std::unique_ptr<tensorflow::Executor, std::default_deletetensorflow::Executor >
) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1
#28 0x0000007f9750f41c in tensorflow::DirectSession::CreateExecutors(tensorflow::CallableOptions const&, std::unique_ptr<tensorflow::DirectSession::ExecutorsAndKeys, std::default_deletetensorflow::DirectSession::ExecutorsAndKeys >, std::unique_ptr<tensorflow::DirectSession::FunctionInfo, std::default_deletetensorflow::DirectSession::FunctionInfo >, tensorflow::DirectSession::RunStateArgs*) ()
from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#29 0x0000007f9751068c in tensorflow::DirectSession::GetOrCreateExecutors(absl::Span<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const>, absl::Span<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const>, absl::Span<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const>, tensorflow::DirectSession::ExecutorsAndKeys**, tensorflow::DirectSession::RunStateArgs*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#30 0x0000007f97511b74 in tensorflow::DirectSession::Run(tensorflow::RunOptions const&, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, std::vector<tensorflow::Tensor, std::allocatortensorflow::Tensor >, tensorflow::RunMetadata) ()
from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#31 0x0000007f94c8884c in tensorflow::SessionRef::Run(tensorflow::RunOptions const&, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, std::vector<tensorflow::Tensor, std::allocatortensorflow::Tensor >, tensorflow::RunMetadata) ()
from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#32 0x0000007f9522f0fc in TF_Run_Helper(tensorflow::Session*, char const*, TF_Buffer const*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, TF_Tensor**, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&, TF_Buffer*, TF_Status*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#33 0x0000007f9522fcd8 in TF_SessionRun () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#34 0x0000007f94c86178 in tensorflow::TF_SessionRun_wrapper_helper(TF_Session*, char const*, TF_Buffer const*, std::vector<TF_Output, std::allocator<TF_Output> > const&, std::vector<_object*, std::allocator<_object*> > const&, std::vector<TF_Output, std::allocator<TF_Output> > const&, std::vector<TF_Operation*, std::allocator<TF_Operation*> > const&, TF_Buffer*, TF_Status*, std::vector<_object*, std::allocator<_object*> >) ()
from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#35 0x0000007f94c861c4 in tensorflow::TF_SessionRun_wrapper(TF_Session
, TF_Buffer const*, std::vector<TF_Output, std::allocator<TF_Output> > const&, std::vector<_object*, std::allocator<_object*> > const&, std::vector<TF_Output, std::allocator<TF_Output> > const&, std::vector<TF_Operation*, std::allocator<TF_Operation*> > const&, TF_Buffer*, TF_Status*, std::vector<_object*, std::allocator<_object*> >*) ()
from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#36 0x0000007f94c4d310 in _wrap_TF_SessionRun_wrapper () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#37 0x00000000005b9bec in _PyCFunction_FastCallDict ()
#38 0x0000000000529958 in ?? ()
#39 0x000000000052e5a8 in _PyEval_EvalFrameDefault ()
#40 0x0000000000527860 in ?? ()
#41 0x00000000005297dc in ?? ()
#42 0x000000000052e5a8 in _PyEval_EvalFrameDefault ()
#43 0x0000000000528ff0 in ?? ()
#44 0x00000000005dd3a0 in ?? ()
#45 0x0000000000606c40 in PyObject_Call ()
#46 0x000000000052d1c0 in _PyEval_EvalFrameDefault ()
#47 0x0000000000528ff0 in ?? ()
#48 0x0000000000529584 in ?? ()
#49 0x00000000005297dc in ?? ()
#50 0x000000000052e5a8 in _PyEval_EvalFrameDefault ()
#51 0x0000000000528ff0 in ?? ()
#52 0x0000000000529584 in ?? ()
#53 0x00000000005297dc in ?? ()
#54 0x000000000052e5a8 in _PyEval_EvalFrameDefault ()
#55 0x0000000000527860 in ?? ()
#56 0x00000000005297dc in ?? ()
#57 0x000000000052e5a8 in _PyEval_EvalFrameDefault ()
#58 0x0000000000528ff0 in ?? ()
#59 0x0000000000529584 in ?? ()
#60 0x00000000005297dc in ?? ()
#61 0x000000000052e800 in _PyEval_EvalFrameDefault ()
#62 0x0000000000528ff0 in ?? ()
#63 0x0000000000529584 in ?? ()
#64 0x00000000005297dc in ?? ()
#65 0x000000000052e800 in _PyEval_EvalFrameDefault ()

Can you please help me to resolve this problem?

Hi,

We want to reproduce this issue on our environment.
Do you share a simple reproducible TensorFlow script with us?

Thanks.

Hi, @AastaLLL, thanks for the fast response. I’m sending you a gdrive link to an archive that should be enough to reproduce the issue:

Our optimization scripts are based on: https://github.com/tensorflow/tensorrt/tree/r1.14+/tftrt/examples/object_detection

I didn’t mention that we’re running optimizations on Jetson AGX.

Any update?

Hi,

Sorry that we try to reproduce this issue but get stuck at the missing object_detection module.

This issue is more like from TensorFlow.
Could you help to check the same script on a desktop environment to see if works?

Thanks.

You’re right that missing object_detection module is an issue related to tensorflow, but you can install it from https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md.
Then, you should be able to run the optimizations.

Best!

Hi,

Sorry for the late update.

Based on your log, this issue is more like a TensorFlow issue.

#36 0x0000007f94c4d310 in _wrap_TF_SessionRun_wrapper () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so

The error is caused by a TensorFlow internal library so we may not have enough information to debug it.
Does this also occur on a desktop environment? Have you checked this issue with TensorFlow team?

Thanks.