Issue in experimenting with transfer learning

sunil76joshi · June 16, 2020, 3:51pm

Hi ,
I am trying to train the network with more custom training
Any idea what could be the reason ?

Getting Segmentation fault (core dumped)

I tried to follow steps from Getting Started with NVIDIA Jetson Nano Part 2: Image Classification | Digi-Key Electronics - YouTube

crevavi@crevavi-desktop:~/jetson-inference/python/training/classification$ python train.py --model-dir=utensils ~/datasets/utensils/ --epochs=1 --batch-size=4
Use GPU: 0 for training
=> dataset classes: 3 [‘background’, ‘fork’, ‘spoon’]
=> using pre-trained model ‘resnet18’
=> reshaped ResNet fully-connected layer with: Linear(in_features=512, out_features=3, bias=True)
Epoch: [0][ 0/23] Time 60.684 (60.684) Data 1.600 ( 1.600) Loss 1.6361e+00 (1.6361e+00) Acc@1 25.00 ( 25.00) Acc@5 100.00 (100.00)
Epoch: [0][10/23] Time 0.751 ( 6.571) Data 0.000 ( 0.157) Loss 1.4579e+00 (1.4134e+01) Acc@1 50.00 ( 36.36) Acc@5 100.00 (100.00)
Epoch: [0][20/23] Time 0.752 ( 3.801) Data 0.000 ( 0.102) Loss 4.4567e+00 (1.6097e+01) Acc@1 50.00 ( 35.71) Acc@5 100.00 (100.00)
Epoch: [0] completed, elapsed time 85.990 seconds
Test: [ 0/23] Time 2.100 ( 2.100) Loss 2.8461e+05 (2.8461e+05) Acc@1 0.00 ( 0.00) Acc@5 100.00 (100.00)
Test: [10/23] Time 0.273 ( 0.438) Loss 0.0000e+00 (2.4456e+05) Acc@1 100.00 ( 29.55) Acc@5 100.00 (100.00)
Test: [20/23] Time 0.268 ( 0.359) Loss 1.3643e+05 (1.6559e+05) Acc@1 0.00 ( 35.71) Acc@5 100.00 (100.00)

Acc@1 32.967 Acc@5 100.000
saved best model to: utensils/model_best.pth.tar
Segmentation fault (core dumped)
crevavi@crevavi-desktop:~/jetson-inference/python/training/classification$

dusty_nv · June 16, 2020, 4:20pm

Hi @sunil76joshi, the error only occurs after training has completed, so you an ignore it for now.

You may want to train it for more than one epoch to see if the accuracy improves.

sunil76joshi · June 17, 2020, 2:53pm

Hi @dusty_nv,
Thanks for the quick response. I tried for default 34 epochs as well. Facing the same issue.

After I tried
python train.py --model-dir=utensils ~/datasets/utensils/

I went ahead with next command as below
python onnx_export.py --model-dir=utensils
and then
imagenet-camera --model=utensils/resnet18.onnx --input_blob=input_0 --output_blob=output0 --lables=/home/crevavi/datasets/utensils/labels.txt --camera=/dev/video0 --width=640 --height=480

I keep getting segmentation fault at every step.
…

%output_0 : Float(1, 3) = onnx::Softmaxaxis=1 # /home/crevavi/.local/lib/python3.6/site-packages/torch/nn/functional.py:1231:0
return (%output_0)

model exported to: utensils/resnet18.onnx
Segmentation fault (core dumped)
crevavi@crevavi-desktop:~/jetson-inference/python/training/classification$ imagenet-camera --model=utensils/resnet18.onnx --input_blob=input_0 --output_blob=output0 --lables=/home/crevavi/datasets/utensils/labels.txt --camera=/dev/video0 --width=640 --height=480
[gstreamer] initialized gstreamer, version 1.14.5.0
[gstreamer] gstCamera attempting to initialize with GST_SOURCE_NVARGUS, camera /dev/video0
[gstreamer] gstCamera pipeline string:
v4l2src device=/dev/video0 ! video/x-raw, width=(int)640, height=(int)480, format=YUY2 ! videoconvert ! video/x-raw, format=RGB ! videoconvert !appsink name=mysink
[gstreamer] gstCamera successfully initialized with GST_SOURCE_V4L2, camera /dev/video0

imagenet-camera: successfully initialized camera device
width: 640
height: 480
depth: 24 (bpp)

[TRT] imageNet – failed to initialize.
imagenet-console: failed to initialize imageNet
crevavi@crevavi-desktop:~/jetson-inference/python/training/classification$

dusty_nv · June 17, 2020, 3:13pm

There is a typo in your command line - --lables should be --labels.

Also I recommend to try imagenet-console first on a test image, before jumping to camera.

sunil76joshi · June 17, 2020, 4:26pm

oops… my bad…
I tried

imagenet-console --model=utensils/resnet18.onnx --input_blob=input_0 --output_blob=output0 --labels=/home/crevavi/datasets/utensils/labels.txt 16062020-153825.jpg

I see
[TRT] binding – index 1
– name ‘output_0’
– type FP32
– in/out OUTPUT
– # dims 2
– dim #0 1 (SPATIAL)
– dim #1 3 (SPATIAL)
[TRT] binding to input 0 input_0 binding index: 0
[TRT] binding to input 0 input_0 dims (b=1 c=3 h=224 w=224) size=602112
[TRT] INVALID_ARGUMENT: Cannot find binding of given name: output0
[TRT] binding to output 0 output0 binding index: -1
[TRT] Parameter check failed at: engine.cpp::getBindingDimensions::1977, condition: bindIndex >= 0 && bindIndex < getNbBindings()
[TRT] binding to output 0 output0 dims (b=1 c=1 h=1 w=1) size=4
device GPU, utensils/resnet18.onnx initialized.
[TRT] utensils/resnet18.onnx loaded
imageNet – loaded 3 class info entries
imageNet – didn’t load expected number of class descriptions (3 of 1)
imageNet – failed to load synset class descriptions (3 / 3 of 1)
[TRT] imageNet – failed to initialize.
imagenet-console: failed to initialize imageNet

I wonder is there is any memory allocation issue …

dusty_nv · June 17, 2020, 4:37pm

I think this is missing an underscore, it should be --output_blob=output_0 instead.

sunil76joshi · June 17, 2020, 4:51pm

Ohh yes, i did many typos … sorry for bothering you with that…
It looks working now…
I will do more training stuff and look for better accuracy.

Thanks a lot for super fast support !! I really appreciate it!

dusty_nv · June 17, 2020, 5:10pm

No problem, glad you got it working!

You can ignore the PyTorch crashes for now, it only happens when PyTorch exits. I am trying to figure out why it happens.

kwok.paul · June 21, 2020, 10:56pm

I am also getting the segmentation fault error when I am running the sample classification training for Plant, Cat and Dog in Nvidia helloworld

Epoch: [34][0/8] Time 0.470 ( 0.470) Data 0.356 ( 0.356) Loss 0.0000e+00 (0.0000e+00) Acc@1 100.00 (100.00) Acc@5 100.00 (100.00)
Epoch: [34] completed, elapsed time 1.620 seconds
Test: [0/2] Time 0.423 ( 0.423) Loss 0.0000e+00 (0.0000e+00) Acc@1 100.00 (100.00) Acc@5 100.00 (100.00)

Acc@1 100.000 Acc@5 100.000
saved checkpoint to: perrier/checkpoint.pth.tar
Segmentation fault (core dumped)

kayccc · July 2, 2020, 3:41am

Hi kwok.paul,

Please open a new topic for your issue. Thanks

Topic		Replies	Views
Jetson AI Fundamentals - S3E2 - Image Classification Inference - low accuracy after 70 epochs by using imagenet Jetson Nano jetson-inference	6	1061	October 18, 2021
Segmentation fault (core dumped) on jetson nano when training resnet-18 on my small dataset of just 60 images using transfer learning! Jetson Nano ai-training	8	2322	October 18, 2021
Using jetson nano i conducted a training of my own model for object detection with the help of trained model but it shows an error and it is below Jetson Nano jetson-inference	18	926	July 15, 2022
Jetson nano start the Docker an error occurred while training your detection model ：Segmentation fault (core dumped) Jetson Nano jetson-inference	7	1244	April 21, 2022
Jetson AI Fundamentals - S3E3 - Training Image Classification Models Jetson Nano jetson-inference	6	663	February 21, 2023
Hello AI World - new object detection training and video interfaces Jetson Nano	29	4514	April 20, 2021
Jetson-inference: cannot train model with custom data set Jetson Nano jetson-inference	11	1973	March 9, 2022
Problems Training Models Jetson Nano ai-training	19	4689	January 12, 2022
Segmentation fault at training network Jetson TX2 ai-training	6	2610	September 5, 2021
Unable to train custom data Jetson Orin Nano ai-training	4	31	February 27, 2025

Issue in experimenting with transfer learning

Related topics