Segmentation fault when running TF detection tutorial

Hello,

I am starting running TF on a Quadro P600, ubuntu 18.04.
I installed the latest driver, docker 19.03 and the package nvidia-container-toolkit.
My short term goal is to be able to run and train CNN for image detection.

I am trying to run the following tutorial : Google Colab

During the following call:

output_dict = model(input_tensor)

The python kernel is doing a seg fault.

I tried to run TF using the NVidia docker image (nvcr.io/nvidia/tensorflow:19.11-tf2-py3) and also the docker image provided by TF, I have the same issue with both. I also tested to run the code on CPU only and it works fine.

Can you help ?

Many thanks by advance.
Regards,
Gilles

I ran a python script with an equivalent code in gdb.
Here is the backtrace of the segfault :

#0  0x00007fc2808636e9 in tensorflow::NonMaxSuppressionV2GPUOp::Compute(tensorflow::OpKernelContext*) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so
#1  0x00007fc27c1a7b52 in tensorflow::BaseGPUDevice::Compute(tensorflow::OpKernel*, tensorflow::OpKernelContext*) ()
   from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.2
#2  0x00007fc27c205339 in tensorflow::(anonymous namespace)::ExecutorState::Process(tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, long long) ()
   from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.2
#3  0x00007fc27c2058ff in std::_Function_handler<void (), tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady(absl::InlinedVector<tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, 8ul, std::allocator<tensorflow::(anonymous namespace)::ExecutorState::TaggedNode> > const&, tensorflow::(anonymous namespace)::ExecutorState::TaggedNodeReadyQueue*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.2
#4  0x00007fc27c2b6121 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) ()
   from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.2
#5  0x00007fc27c2b3818 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
   from /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.2
#6  0x00007fc2761e166f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007fc2e50096db in start_thread (arg=0x7fc0bffff700) at pthread_create.c:463
#8  0x00007fc2e534288f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Well, I thinks I figured it out by myself.
The bug is in TF, it is fixed in the nighly release: https://github.com/tensorflow/tensorflow/issues/32261