Python Subprocess and Tensorflow Segmentation fault (core dumped)

Hi all,

Running Environment:

Python 3.6.9 (default, Nov  7 2019, 10:44:02) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
2020-07-07 16:36:06.559166: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
>>> tensorflow.__version__
'1.15.0'
>>> 

I am trying to measure the power consumption of AGX Xavier during running NN inference. The code is like:

if report_power:
    power_process = subprocess.Popen(
        [str(dir_name) + "power", "/sys/bus/i2c/drivers/ina3221x/1-0041/iio_device/in_power2_input"], shell=False, stderr=subprocess.PIPE)

start = time.time()
predictions = sess.run(image_tensor, {input_tensor: images})
times.append(time.time() - start)
for precision in predictions:
    result_all.append(precision)

if report_power:
    power_process.terminate()
    powers = str(power_process.stderr.read(), 'utf-8').split()
    powers = list(map(float, powers))

When report_power is 0, this code works very well, but when report_power is 1, then when it executes predictions = sess.run(image_tensor, {input_tensor: images}), it will occur segment fault. (power is a C program that reads a file continuously, I also try a simple program that only prints a number to stderr to replace power but this error doesn’t disappear)

#0  0x0000007f4c7bca7c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so
#1  0x0000007f50c8634c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#2  0x0000007f50c083e8 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#3  0x0000007f50af672c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#4  0x0000007f50af681c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#5  0x0000007f50ba4d70 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#6  0x0000007f50ba4fcc in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#7  0x0000007f50adc578 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#8  0x0000007f50ae0854 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#9  0x0000007f50bfd260 in cuMemcpyHtoDAsync_v2 () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#10 0x0000007f90726a90 in cuMemcpyHtoDAsync_v2 () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#11 0x0000007f90630d78 in stream_executor::gpu::GpuDriver::AsynchronousMemcpyH2D(stream_executor::gpu::GpuContext*, unsigned long long, void const*, unsigned long long, CUstream_st*) ()
   from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#12 0x0000007f906ddaf4 in stream_executor::Stream::ThenMemcpy(stream_executor::DeviceMemoryBase*, void const*, unsigned long long) ()
   from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#13 0x0000007f86f8bdb4 in tensorflow::GPUUtil::CopyCPUTensorToGPU(tensorflow::Tensor const*, tensorflow::DeviceContext const*, tensorflow::Device*, tensorflow::Tensor*, std::function<void (tensorflow::Status const&)>, bool) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#14 0x0000007f86f8d078 in tensorflow::GPUDeviceContext::CopyCPUTensorToDevice(tensorflow::Tensor const*, tensorflow::Device*, tensorflow::Tensor*, std::function<void (tensorflow::Status const&)>, bool) const
    () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#15 0x0000007f86f7949c in tensorflow::BaseGPUDevice::MaybeCopyTensorToGPU(tensorflow::AllocatorAttributes const&, tensorflow::Tensor const&, tensorflow::Tensor*, std::function<void (tensorflow::Status const&)>) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#16 0x0000007f86f7c094 in tensorflow::BaseGPUDevice::MakeTensorFromProto(tensorflow::TensorProto const&, tensorflow::AllocatorAttributes, tensorflow::Tensor*) ()
   from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#17 0x0000007f8d186afc in tensorflow::ConstantOp::ConstantOp(tensorflow::OpKernelConstruction*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#18 0x0000007f8d186dec in tensorflow::{lambda(tensorflow::OpKernelConstruction*)#4}::_FUN () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#19 0x0000007f86d9d444 in tensorflow::CreateOpKernel(tensorflow::DeviceType, tensorflow::DeviceBase*, tensorflow::Allocator*, tensorflow::FunctionLibraryRuntime*, tensorflow::NodeDef const&, int, tensorflow::OpKernel**) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#20 0x0000007f86fc7564 in tensorflow::CreateNonCachedKernel(tensorflow::Device*, tensorflow::FunctionLibraryRuntime*, tensorflow::NodeDef const&, int, tensorflow::OpKernel**) ()
   from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#21 0x0000007f86fe10c0 in tensorflow::FunctionLibraryRuntimeImpl::CreateKernel(tensorflow::NodeDef const&, tensorflow::FunctionLibraryRuntime*, tensorflow::OpKernel**) ()
   from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#22 0x0000007f86fe16b4 in tensorflow::FunctionLibraryRuntimeImpl::CreateKernel(tensorflow::NodeDef const&, tensorflow::OpKernel**) ()
   from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#23 0x0000007f8c47f75c in std::_Function_handler<tensorflow::Status (tensorflow::NodeDef const&, tensorflow::OpKernel**), tensorflow::DirectSession::CreateExecutors(tensorflow::CallableOptions const&, std::unique_ptr<tensorflow::DirectSession::ExecutorsAndKeys, std::default_delete<tensorflow::DirectSession::ExecutorsAndKeys> >*, std::unique_ptr<tensorflow::DirectSession::FunctionInfo, std::default_delete<tensorflow::DirectSession::FunctionInfo> >*, tensorflow::DirectSession::RunStateArgs*)::{lambda(tensorflow::NodeDef const&, tensorflow::OpKernel**)#1}>::_M_invoke(std::_Any_data const&, tensorflow::NodeDef const&, tensorflow::OpKernel**&&) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#24 0x0000007f86fd3b58 in tensorflow::(anonymous namespace)::ExecutorImpl::Initialize() () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#25 0x0000007f86fd529c in tensorflow::NewLocalExecutor(tensorflow::LocalExecutorParams const&, std::unique_ptr<tensorflow::Graph const, std::default_delete<tensorflow::Graph const> >, tensorflow::Executor**)
    () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#26 0x0000007f86fd5328 in tensorflow::(anonymous namespace)::DefaultExecutorRegistrar::Factory::NewExecutor(tensorflow::LocalExecutorParams const&, std::unique_ptr<tensorflow::Graph const, std::default_delete<---Type <return> to continue, or q <return> to quit---
tensorflow::Graph const> >, std::unique_ptr<tensorflow::Executor, std::default_delete<tensorflow::Executor> >*) ()
   from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#27 0x0000007f86fd5a20 in tensorflow::NewExecutor(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tensorflow::LocalExecutorParams const&, std::unique_ptr<tensorflow::Graph const, std::default_delete<tensorflow::Graph const> >, std::unique_ptr<tensorflow::Executor, std::default_delete<tensorflow::Executor> >*) ()
   from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#28 0x0000007f8c48e0cc in tensorflow::DirectSession::CreateExecutors(tensorflow::CallableOptions const&, std::unique_ptr<tensorflow::DirectSession::ExecutorsAndKeys, std::default_delete<tensorflow::DirectSession::ExecutorsAndKeys> >*, std::unique_ptr<tensorflow::DirectSession::FunctionInfo, std::default_delete<tensorflow::DirectSession::FunctionInfo> >*, tensorflow::DirectSession::RunStateArgs*) ()
   from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#29 0x0000007f8c48f33c in tensorflow::DirectSession::GetOrCreateExecutors(absl::Span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const>, absl::Span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const>, absl::Span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const>, tensorflow::DirectSession::ExecutorsAndKeys**, tensorflow::DirectSession::RunStateArgs*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#30 0x0000007f8c490824 in tensorflow::DirectSession::Run(tensorflow::RunOptions const&, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::RunMetadata*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#31 0x0000007f89c04294 in tensorflow::SessionRef::Run(tensorflow::RunOptions const&, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::RunMetadata*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#32 0x0000007f8a1abaec in TF_Run_Helper(tensorflow::Session*, char const*, TF_Buffer const*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, TF_Tensor**, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, TF_Buffer*, TF_Status*) ()
   from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#33 0x0000007f8a1ac6c8 in TF_SessionRun () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#34 0x0000007f89c01bc0 in tensorflow::TF_SessionRun_wrapper_helper(TF_Session*, char const*, TF_Buffer const*, std::vector<TF_Output, std::allocator<TF_Output> > const&, std::vector<_object*, std::allocator<_object*> > const&, std::vector<TF_Output, std::allocator<TF_Output> > const&, std::vector<TF_Operation*, std::allocator<TF_Operation*> > const&, TF_Buffer*, TF_Status*, std::vector<_object*, std::allocator<_object*> >*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#35 0x0000007f89c01c0c in tensorflow::TF_SessionRun_wrapper(TF_Session*, TF_Buffer const*, std::vector<TF_Output, std::allocator<TF_Output> > const&, std::vector<_object*, std::allocator<_object*> > const&, std::vector<TF_Output, std::allocator<TF_Output> > const&, std::vector<TF_Operation*, std::allocator<TF_Operation*> > const&, TF_Buffer*, TF_Status*, std::vector<_object*, std::allocator<_object*> >*) ()
   from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#36 0x0000007f89bc8d58 in _wrap_TF_SessionRun_wrapper () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#37 0x00000000005b9bec in _PyCFunction_FastCallDict ()
#38 0x0000000000529958 in ?? ()
#39 0x000000000052e5a8 in _PyEval_EvalFrameDefault ()
#40 0x0000000000527860 in ?? ()
#41 0x00000000005297dc in ?? ()
#42 0x000000000052e5a8 in _PyEval_EvalFrameDefault ()
#43 0x0000000000528ff0 in ?? ()
#44 0x00000000005dd3a0 in ?? ()
#45 0x0000000000606c40 in PyObject_Call ()
#46 0x000000000052d1c0 in _PyEval_EvalFrameDefault ()
#47 0x0000000000528ff0 in ?? ()
#48 0x0000000000529584 in ?? ()
#49 0x00000000005297dc in ?? ()
#50 0x000000000052e5a8 in _PyEval_EvalFrameDefault ()
#51 0x0000000000528ff0 in ?? ()
#52 0x0000000000529584 in ?? ()
#53 0x00000000005297dc in ?? ()
#54 0x000000000052e5a8 in _PyEval_EvalFrameDefault ()
#55 0x0000000000527860 in ?? ()
#56 0x00000000005297dc in ?? ()
#57 0x000000000052e5a8 in _PyEval_EvalFrameDefault ()
#58 0x0000000000528ff0 in ?? ()
#59 0x0000000000529584 in ?? ()
#60 0x00000000005297dc in ?? ()
#61 0x000000000052e5a8 in _PyEval_EvalFrameDefault ()
#62 0x0000000000527860 in ?? ()
#63 0x00000000005297dc in ?? ()
#64 0x000000000052e5a8 in _PyEval_EvalFrameDefault ()
#65 0x0000000000528ff0 in ?? ()
#66 0x0000000000630438 in PyRun_FileExFlags ()
#67 0x0000000000635acc in PyRun_SimpleFileExFlags ()
#68 0x00000000006202c8 in Py_Main ()
#69 0x0000000000420d3c in main ()

Thank you!

Hi,

Could you share a complete reproducible source with us so we can check it in our environment.
Thanks.

Hi AastaLLL,

Sorry for my mistake, today I tried to export the main part of this code for you but I found it can be executed. Then I returned to my original code, amazing, it also worked. I haven’t modified this code and I have tried to restart the Xavier, but before today restart didn’t work. I don’t know the reason why it becomes well today and current no problem. Maybe one day it doesn’t work well, I will come back for help.

Thank you.