I am getting segmentation fault while training my neural network.
$ python tools/train_lanenet.py
The output is as following:
2021-06-02 13:47:51.200238: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
W0602 13:47:51.293999 15212 deprecation.py:40] Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
W0602 13:47:56.888735 15212 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
- community/20180907-contrib-sunset.md at master · tensorflow/community · GitHub
- GitHub - tensorflow/addons: Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
-
GitHub - tensorflow/io: Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
W0602 13:47:58.538597 15212 module_wrapper.py:139] From /home/nvidia/lane-det/lane-det-venv/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.read_file is deprecated. Please use tf.io.read_file instead.
W0602 13:47:58.540126 15212 module_wrapper.py:139] From /home/nvidia/lane-det/lane-det-venv/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
2021-06-02 13:48:25.111512: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2021-06-02 13:48:25.111947: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x13a23150 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-06-02 13:48:25.111996: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-06-02 13:48:25.119393: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-06-02 13:48:25.217014: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-06-02 13:48:25.217320: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x15064ac0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-06-02 13:48:25.217370: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA Tegra X2, Compute Capability 6.2
2021-06-02 13:48:25.217657: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-06-02 13:48:25.217760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1666] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3
pciBusID: 0000:00:00.0
2021-06-02 13:48:25.217827: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-06-02 13:48:25.221976: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-06-02 13:48:25.224597: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-02 13:48:25.225339: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-02 13:48:25.229724: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-06-02 13:48:25.232906: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-06-02 13:48:25.233694: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-06-02 13:48:25.233889: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-06-02 13:48:25.234066: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-06-02 13:48:25.234137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1794] Adding visible gpu devices: 0
2021-06-02 13:48:25.234214: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-06-02 13:48:26.602513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-02 13:48:26.602598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] 0
2021-06-02 13:48:26.602625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0: N
2021-06-02 13:48:26.602919: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-06-02 13:48:26.603123: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-06-02 13:48:26.603250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6672 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
I0602 13:48:29.795616 15212 train_lanenet.py:232] Training from scratch
2021-06-02 13:49:00.076055: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
Segmentation fault (core dumped)
With cuda-memcheck
$ cuda-memcheck python tools/train_lanenet.py
And it leaves the following information:
========= CUDA-MEMCHECK
2021-06-02 13:49:22.762973: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
W0602 13:49:22.858016 15263 deprecation.py:40] Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
W0602 13:49:28.682472 15263 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
- community/20180907-contrib-sunset.md at master · tensorflow/community · GitHub
- GitHub - tensorflow/addons: Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
-
GitHub - tensorflow/io: Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
W0602 13:49:30.281486 15263 module_wrapper.py:139] From /home/nvidia/lane-det/lane-det-venv/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.read_file is deprecated. Please use tf.io.read_file instead.
W0602 13:49:30.282983 15263 module_wrapper.py:139] From /home/nvidia/lane-det/lane-det-venv/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
2021-06-02 13:49:56.950472: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2021-06-02 13:49:56.951233: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x26a59c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-06-02 13:49:56.951288: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-06-02 13:49:56.958765: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
========= Program hit CUDA_ERROR_UNKNOWN (error 999) due to “unknown error” on CUDA API call to cuDevicePrimaryCtxRetain.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (cuDevicePrimaryCtxRetain + 0x114) [0x1d235c]
========= Host Frame:/home/nvidia/lane-det/lane-det-venv/lib/python3.6/site-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so [0x896dce4]
========= Host Frame:/home/nvidia/lane-det/lane-det-venv/lib/python3.6/site-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so (_ZN15stream_executor3gpu9GpuDriver13CreateContextEiiRKNS_13DeviceOptionsEPPNS0_10GpuContextE + 0x160) [0x88667b8]
========= Host Frame:/home/nvidia/lane-det/lane-det-venv/lib/python3.6/site-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so (_ZN15stream_executor3gpu11GpuExecutor4InitEiNS_13DeviceOptionsE + 0x14c) [0x6fac5c4]
========= Host Frame:/home/nvidia/lane-det/lane-det-venv/lib/python3.6/site-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so (_ZN15stream_executor14StreamExecutor4InitENS_13DeviceOptionsE + 0x78) [0x8941a30]
========= Host Frame:/home/nvidia/lane-det/lane-det-venv/lib/python3.6/site-packages/tensorflow_core/python/…/libtensorflow_framework.so.1 (_ZN15stream_executor3gpu12CudaPlatform19GetUncachedExecutorERKNS_20StreamExecutorConfigE + 0x1a8) [0x103f6b0]
2021-06-02 13:49:57.063774: W tensorflow/compiler/xla/service/platform_util.cc:210] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_UNKNOWN: unknown error
========= Host Frame:/home/nvidia/lane-det/lane-det-venv/lib/python3.6/site-packages/tensorflow_core/python/…/libtensorflow_framework.so.1 [0x103e704]
========= Host Frame:/home/nvidia/lane-det/lane-det-venv/lib/python3.6/site-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so (_ZN15stream_executor13ExecutorCache11GetOrCreateERKNS_20StreamExecutorConfigERKSt8functionIFNS_4port8StatusOrISt10unique_ptrINS_14StreamExecutorESt14default_deleteIS8_EEEEvEE + 0x268) [0x8957820]
========= Host Frame:/home/nvidia/lane-det/lane-det-venv/lib/python3.6/site-packages/tensorflow_core/python/…/libtensorflow_framework.so.1 (_ZN15stream_executor3gpu12CudaPlatform11GetExecutorERKNS_20StreamExecutorConfigE + 0x50) [0x103e7d8]
========= Host Frame:/home/nvidia/lane-det/lane-det-venv/lib/python3.6/site-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so [0x6e8f9c4]
========= Host Frame:/home/nvidia/lane-det/lane-det-venv/lib/python3.6/site-packages/tensorflow_core/python/…/libtensorflow_framework.so.1 (_ZN5Eigen15ThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi + 0x3ec) [0xbf5b6c]
========= 2021-06-02 13:49:57.064913: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: no supported devices found for platform CUDA
========= Error: process didn’t terminate successfully
========= No CUDA-MEMCHECK results found
System Information:
Python: 3.6
JetPack: 4.5.1
Tensorflow : 1.15.5+nv21.5