Dear Experts,
I am training my custom data using Jetson Xavier NX development board. I have two classes of object.
In my training, I use
python3 train_ssd.py --dataset-type=voc --data=data/Food --model-dir=models/Food --batch-size=4 --epochs=100
At 51th epochs, i got Unexpected segmentation fault encounter in worker.
ERROR: Unexpected segmentation fault encountered in worker.
Traceback (most recent call last):
File “/home/gbewegung/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 779, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File “/usr/lib/python3.6/multiprocessing/queues.py”, line 104, in get
if not self._poll(timeout):
File “/usr/lib/python3.6/multiprocessing/connection.py”, line 257, in poll
return self._poll(timeout)
File “/usr/lib/python3.6/multiprocessing/connection.py”, line 414, in _poll
r = wait([self], timeout)
File “/usr/lib/python3.6/multiprocessing/connection.py”, line 911, in wait
ready = selector.select(timeout)
File “/usr/lib/python3.6/selectors.py”, line 376, in select
fd_event_list = self._poll.poll(timeout)
File “/home/gbewegung/.local/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py”, line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 13276) is killed by signal: Segmentation fault.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “train_ssd.py”, line 396, in
train(train_loader, net, criterion, optimizer, device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
File “train_ssd.py”, line 133, in train
for i, data in enumerate(loader):
File “/home/gbewegung/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 363, in next
data = self._next_data()
File “/home/gbewegung/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 974, in _next_data
idx, data = self._get_data()
File “/home/gbewegung/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 941, in _get_data
success, data = self._try_get_data()
File “/home/gbewegung/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 792, in _try_get_data
raise RuntimeError(‘DataLoader worker (pid(s) {}) exited unexpectedly’.format(pids_str))
Note: Jetson Xavier NX trained 100 epochs when i used 1200 training images. But when i increase to 1600 images, i got the segmentation fault