CUDA launch failure while tensorflow object detection training

Hi there,

I have been using tensorflow object detection API for training object detection for mobilenet v1 on Ubuntu OS.
I have Quadro K5200 graphics card:

$ nvidia-smi
Wed Jul 24 09:16:41 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43       Driver Version: 418.43       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K5200        Off  | 00000000:03:00.0  On |                  Off |
| 26%   37C    P8    14W / 150W |    392MiB /  8118MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1570      G   /usr/lib/xorg/Xorg                            32MiB |
|    0      1617      G   /usr/bin/gnome-shell                          53MiB |
|    0      2266      G   /usr/lib/xorg/Xorg                           148MiB |
|    0      2420      G   /usr/bin/gnome-shell                         153MiB |
+-----------------------------------------------------------------------------+

I noticed that training has been stopped with this error CUDA_ERROR_LAUNCH_FAILED, could someone please advise what is the problem and how I can resolve this?

INFO:tensorflow:global step 27499: loss = 0.9869 (0.690 sec/step)
INFO:tensorflow:global step 27499: loss = 0.9869 (0.690 sec/step)
INFO:tensorflow:global step 27500: loss = 1.2323 (0.736 sec/step)
INFO:tensorflow:global step 27500: loss = 1.2323 (0.736 sec/step)
INFO:tensorflow:global step 27501: loss = 1.1242 (0.677 sec/step)
INFO:tensorflow:global step 27501: loss = 1.1242 (0.677 sec/step)
INFO:tensorflow:global step 27502: loss = 1.3251 (0.720 sec/step)
INFO:tensorflow:global step 27502: loss = 1.3251 (0.720 sec/step)
2019-07-23 22:13:35.131678: E tensorflow/stream_executor/cuda/cuda_driver.cc:1000] could not wait stream on event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-07-23 22:13:35.131640: E tensorflow/stream_executor/cuda/cuda_driver.cc:1131] failed to enqueue async memcpy from host to device: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure; GPU dst: 0x76d445b00; host src: 0x7fc500874c00; size: 4194304=0x400000
2019-07-23 22:13:35.131733: E tensorflow/stream_executor/cuda/cuda_driver.cc:1000] could not wait stream on event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-07-23 22:13:35.131735: I tensorflow/stream_executor/stream.cc:5027] [stream=0xdc71cc0,impl=0xccaeaa0] did not memcpy host-to-device; source: 0x203113f00
2019-07-23 22:13:35.131772: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-07-23 22:13:35.131768: E tensorflow/stream_executor/cuda/cuda_driver.cc:1000] could not wait stream on event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-07-23 22:13:35.131757: I tensorflow/stream_executor/stream.cc:5027] [stream=0xdc71cc0,impl=0xccaeaa0] did not memcpy host-to-device; source: 0x20315b600
2019-07-23 22:13:35.131776: E tensorflow/stream_executor/cuda/cuda_driver.cc:1000] could not wait stream on event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-07-23 22:13:35.131751: E tensorflow/stream_executor/cuda/cuda_dnn.cc:82] CUDNN_STATUS_MAPPING_ERROR
in tensorflow/stream_executor/cuda/cuda_dnn.cc(2474): 'cudnnConvolutionForward( cudnn.handle(), alpha, input_nd.handle(), input_data.opaque(), filter.handle(), filter_data.opaque(), conv.handle(), ToConvForwardAlgo(algo_desc), scratch.opaque(), scratch.size(), beta, output_nd.handle(), output_data->opaque())'
2019-07-23 22:13:35.131792: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:274] Unexpected Event status: 1
2019-07-23 22:13:35.131798: I tensorflow/stream_executor/stream.cc:5027] [stream=0xdc71cc0,impl=0xccaeaa0] did not memcpy host-to-device; source: 0x203113600
2019-07-23 22:13:35.131836: I tensorflow/stream_executor/stream.cc:5027] [stream=0xdc71cc0,impl=0xccaeaa0] did not memcpy host-to-device; source: 0x20315b500
Aborted (core dumped)

Thanks
Amin

May I know which version of Tensorflow you use ? My experience is certain version (in my case 1.12) gave problem but lower version is fine (so far).