When training ssd in jetson-inference/python/training/detection/ssd/
with
python3 train_ssd.py --data=data/gemi --model-dir=models/gemi --batch-size=4 --epochs=30
it shows me this:
2020-10-09 14:28:09 - Epoch: 0, Step: 6470/6880, Avg Loss: 4.9857, Avg Regression Loss 2.5101, Avg Classification Loss: 2.4756
ERROR: Unexpected segmentation fault encountered in worker.
Traceback (most recent call last):
File "/home/sadsavunma/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 779, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/usr/lib/python3.6/multiprocessing/queues.py", line 104, in get
if not self._poll(timeout):
File "/usr/lib/python3.6/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 414, in _poll
r = wait([self], timeout)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 911, in wait
ready = selector.select(timeout)
File "/usr/lib/python3.6/selectors.py", line 376, in select
fd_event_list = self._poll.poll(timeout)
File "/home/sadsavunma/.local/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 24483) is killed by signal: Segmentation fault.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_ssd.py", line 343, in <module>
device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
File "train_ssd.py", line 113, in train
for i, data in enumerate(loader):
File "/home/sadsavunma/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
data = self._next_data()
File "/home/sadsavunma/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 974, in _next_data
idx, data = self._get_data()
File "/home/sadsavunma/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 941, in _get_data
success, data = self._try_get_data()
File "/home/sadsavunma/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 792, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))