YOLO V4 not training

while training yolo v4 on 1 GPU the following error is coming

[ef112f939b52:56258] *** Process received signal ***
[ef112f939b52:56258] Signal: Segmentation fault (11)
[ef112f939b52:56258] Signal code: Address not mapped (1)
[ef112f939b52:56258] Failing at address: 0x10
[ef112f939b52:56258] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3f040)[0x7f70577c4040]
[ef112f939b52:56258] [ 1] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(_ZN10tensorflow8BinaryOpIN5Eigen9GpuDeviceENS_7functor3addIfEEE7ComputeEPNS_15OpKernelContextE+0x100)[0x7f6f6a50ff90]
[ef112f939b52:56258] [ 2] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(_ZN10tensorflow13BaseGPUDevice7ComputeEPNS_8OpKernelEPNS_15OpKernelContextE+0x522)[0x7f6f642fb382]
[ef112f939b52:56258] [ 3] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(+0xf978ab)[0x7f6f6435c8ab]
[ef112f939b52:56258] [ 4] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(+0xf97c6f)[0x7f6f6435cc6f]
[ef112f939b52:56258] [ 5] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(_ZN5Eigen15ThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi+0x281)[0x7f6f6440c791]
[ef112f939b52:56258] [ 6] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(_ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data+0x48)[0x7f6f64409df8]
[ef112f939b52:56258] [ 7] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd6df)[0x7f70556c86df]
[ef112f939b52:56258] [ 8] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f705756d6db]
[ef112f939b52:56258] [ 9] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f70578a671f]
[ef112f939b52:56258] *** End of error message ***
Segmentation fault (core dumped)
Traceback (most recent call last):
File “/usr/local/bin/yolo_v4”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/entrypoint/yolo_v4.py”, line 12, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py”, line 296, in launch_job
AssertionError: Process run failed.

Hi,
We recommend you to check the below samples links, as they might answer your concern
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#samples
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/quick-start-guide/index.html#framework-integration
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#integrate-ovr
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usingtftrt

If issue persist, request you to share the model and script so that we can try reproducing the issue at our end.
Thanks!

Hi @bhargavi.sanadhya,

We request you to please share more details. Based on the information you’ve provided it doesn’t look like TensorRT related issue.
We recommend you to post your concern on related platform to get better help.

Thank you.