Illegal instruction

rajaneconsys · June 1, 2020, 6:11am

Hi all
while training unpruned peoplenet purpose built model i am getting illegal instruction core dumped error.
i am attachinf my train spec file also.peoplenet_train_resnet34_kitti.txt (4.7 KB)
please help me to solve this error. Thanks in Advance

Morganh · June 1, 2020, 7:23am

Please paste your full running log. Thanks.

rajaneconsys · June 1, 2020, 8:30am

Hi @Morganh
Thanks for the reply. please find the below logs
Using TensorFlow backend.
2020-06-01 08:21:31,695 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at /workspace/examples/detectnet_v2/specs/peoplenet_train_resnet34_kitti.txt.
2020-06-01 08:21:31,696 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/examples/detectnet_v2/specs/peoplenet_train_resnet34_kitti.txt

Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) (None, 3, 544, 960) 0

conv1 (Conv2D) (None, 64, 272, 480) 9472 input_1[0][0]

bn_conv1 (BatchNormalization) (None, 64, 272, 480) 256 conv1[0][0]

activation_1 (Activation) (None, 64, 272, 480) 0 bn_conv1[0][0]

block_1a_conv_1 (Conv2D) (None, 64, 136, 240) 36928 activation_1[0][0]

block_1a_bn_1 (BatchNormalizati (None, 64, 136, 240) 256 block_1a_conv_1[0][0]

block_1a_relu_1 (Activation) (None, 64, 136, 240) 0 block_1a_bn_1[0][0]

block_1a_conv_2 (Conv2D) (None, 64, 136, 240) 36928 block_1a_relu_1[0][0]

block_1a_conv_shortcut (Conv2D) (None, 64, 136, 240) 4160 activation_1[0][0]

block_1a_bn_2 (BatchNormalizati (None, 64, 136, 240) 256 block_1a_conv_2[0][0]

block_1a_bn_shortcut (BatchNorm (None, 64, 136, 240) 256 block_1a_conv_shortcut[0][0]

add_1 (Add) (None, 64, 136, 240) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]

block_1a_relu (Activation) (None, 64, 136, 240) 0 add_1[0][0]

block_1b_conv_1 (Conv2D) (None, 64, 136, 240) 36928 block_1a_relu[0][0]

block_1b_bn_1 (BatchNormalizati (None, 64, 136, 240) 256 block_1b_conv_1[0][0]

block_1b_relu_1 (Activation) (None, 64, 136, 240) 0 block_1b_bn_1[0][0]

block_1b_conv_2 (Conv2D) (None, 64, 136, 240) 36928 block_1b_relu_1[0][0]

block_1b_bn_2 (BatchNormalizati (None, 64, 136, 240) 256 block_1b_conv_2[0][0]

add_2 (Add) (None, 64, 136, 240) 0 block_1b_bn_2[0][0]
block_1a_relu[0][0]

block_1b_relu (Activation) (None, 64, 136, 240) 0 add_2[0][0]

block_1c_conv_1 (Conv2D) (None, 64, 136, 240) 36928 block_1b_relu[0][0]

block_1c_bn_1 (BatchNormalizati (None, 64, 136, 240) 256 block_1c_conv_1[0][0]

block_1c_relu_1 (Activation) (None, 64, 136, 240) 0 block_1c_bn_1[0][0]

block_1c_conv_2 (Conv2D) (None, 64, 136, 240) 36928 block_1c_relu_1[0][0]

block_1c_bn_2 (BatchNormalizati (None, 64, 136, 240) 256 block_1c_conv_2[0][0]

add_3 (Add) (None, 64, 136, 240) 0 block_1c_bn_2[0][0]
block_1b_relu[0][0]

block_1c_relu (Activation) (None, 64, 136, 240) 0 add_3[0][0]

block_2a_conv_1 (Conv2D) (None, 128, 68, 120) 73856 block_1c_relu[0][0]

block_2a_bn_1 (BatchNormalizati (None, 128, 68, 120) 512 block_2a_conv_1[0][0]

block_2a_relu_1 (Activation) (None, 128, 68, 120) 0 block_2a_bn_1[0][0]

block_2a_conv_2 (Conv2D) (None, 128, 68, 120) 147584 block_2a_relu_1[0][0]

block_2a_conv_shortcut (Conv2D) (None, 128, 68, 120) 8320 block_1c_relu[0][0]

block_2a_bn_2 (BatchNormalizati (None, 128, 68, 120) 512 block_2a_conv_2[0][0]

block_2a_bn_shortcut (BatchNorm (None, 128, 68, 120) 512 block_2a_conv_shortcut[0][0]

add_4 (Add) (None, 128, 68, 120) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]

block_2a_relu (Activation) (None, 128, 68, 120) 0 add_4[0][0]

block_2b_conv_1 (Conv2D) (None, 128, 68, 120) 147584 block_2a_relu[0][0]

block_2b_bn_1 (BatchNormalizati (None, 128, 68, 120) 512 block_2b_conv_1[0][0]

block_2b_relu_1 (Activation) (None, 128, 68, 120) 0 block_2b_bn_1[0][0]

block_2b_conv_2 (Conv2D) (None, 128, 68, 120) 147584 block_2b_relu_1[0][0]

block_2b_bn_2 (BatchNormalizati (None, 128, 68, 120) 512 block_2b_conv_2[0][0]

add_5 (Add) (None, 128, 68, 120) 0 block_2b_bn_2[0][0]
block_2a_relu[0][0]

block_2b_relu (Activation) (None, 128, 68, 120) 0 add_5[0][0]

block_2c_conv_1 (Conv2D) (None, 128, 68, 120) 147584 block_2b_relu[0][0]

block_2c_bn_1 (BatchNormalizati (None, 128, 68, 120) 512 block_2c_conv_1[0][0]

block_2c_relu_1 (Activation) (None, 128, 68, 120) 0 block_2c_bn_1[0][0]

block_2c_conv_2 (Conv2D) (None, 128, 68, 120) 147584 block_2c_relu_1[0][0]

block_2c_bn_2 (BatchNormalizati (None, 128, 68, 120) 512 block_2c_conv_2[0][0]

add_6 (Add) (None, 128, 68, 120) 0 block_2c_bn_2[0][0]
block_2b_relu[0][0]

block_2c_relu (Activation) (None, 128, 68, 120) 0 add_6[0][0]

block_2d_conv_1 (Conv2D) (None, 128, 68, 120) 147584 block_2c_relu[0][0]

block_2d_bn_1 (BatchNormalizati (None, 128, 68, 120) 512 block_2d_conv_1[0][0]

block_2d_relu_1 (Activation) (None, 128, 68, 120) 0 block_2d_bn_1[0][0]

block_2d_conv_2 (Conv2D) (None, 128, 68, 120) 147584 block_2d_relu_1[0][0]

block_2d_bn_2 (BatchNormalizati (None, 128, 68, 120) 512 block_2d_conv_2[0][0]

add_7 (Add) (None, 128, 68, 120) 0 block_2d_bn_2[0][0]
block_2c_relu[0][0]

block_2d_relu (Activation) (None, 128, 68, 120) 0 add_7[0][0]

block_3a_conv_1 (Conv2D) (None, 256, 34, 60) 295168 block_2d_relu[0][0]

block_3a_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3a_conv_1[0][0]

block_3a_relu_1 (Activation) (None, 256, 34, 60) 0 block_3a_bn_1[0][0]

block_3a_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3a_relu_1[0][0]

block_3a_conv_shortcut (Conv2D) (None, 256, 34, 60) 33024 block_2d_relu[0][0]

block_3a_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3a_conv_2[0][0]

block_3a_bn_shortcut (BatchNorm (None, 256, 34, 60) 1024 block_3a_conv_shortcut[0][0]

add_8 (Add) (None, 256, 34, 60) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]

block_3a_relu (Activation) (None, 256, 34, 60) 0 add_8[0][0]

block_3b_conv_1 (Conv2D) (None, 256, 34, 60) 590080 block_3a_relu[0][0]

block_3b_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3b_conv_1[0][0]

block_3b_relu_1 (Activation) (None, 256, 34, 60) 0 block_3b_bn_1[0][0]

block_3b_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3b_relu_1[0][0]

block_3b_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3b_conv_2[0][0]

add_9 (Add) (None, 256, 34, 60) 0 block_3b_bn_2[0][0]
block_3a_relu[0][0]

block_3b_relu (Activation) (None, 256, 34, 60) 0 add_9[0][0]

block_3c_conv_1 (Conv2D) (None, 256, 34, 60) 590080 block_3b_relu[0][0]

block_3c_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3c_conv_1[0][0]

block_3c_relu_1 (Activation) (None, 256, 34, 60) 0 block_3c_bn_1[0][0]

block_3c_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3c_relu_1[0][0]

block_3c_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3c_conv_2[0][0]

add_10 (Add) (None, 256, 34, 60) 0 block_3c_bn_2[0][0]
block_3b_relu[0][0]

block_3c_relu (Activation) (None, 256, 34, 60) 0 add_10[0][0]

block_3d_conv_1 (Conv2D) (None, 256, 34, 60) 590080 block_3c_relu[0][0]

block_3d_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3d_conv_1[0][0]

block_3d_relu_1 (Activation) (None, 256, 34, 60) 0 block_3d_bn_1[0][0]

block_3d_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3d_relu_1[0][0]

block_3d_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3d_conv_2[0][0]

add_11 (Add) (None, 256, 34, 60) 0 block_3d_bn_2[0][0]
block_3c_relu[0][0]

block_3d_relu (Activation) (None, 256, 34, 60) 0 add_11[0][0]

block_3e_conv_1 (Conv2D) (None, 256, 34, 60) 590080 block_3d_relu[0][0]

block_3e_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3e_conv_1[0][0]

block_3e_relu_1 (Activation) (None, 256, 34, 60) 0 block_3e_bn_1[0][0]

block_3e_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3e_relu_1[0][0]

block_3e_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3e_conv_2[0][0]

add_12 (Add) (None, 256, 34, 60) 0 block_3e_bn_2[0][0]
block_3d_relu[0][0]

block_3e_relu (Activation) (None, 256, 34, 60) 0 add_12[0][0]

block_3f_conv_1 (Conv2D) (None, 256, 34, 60) 590080 block_3e_relu[0][0]

block_3f_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3f_conv_1[0][0]

block_3f_relu_1 (Activation) (None, 256, 34, 60) 0 block_3f_bn_1[0][0]

block_3f_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3f_relu_1[0][0]

block_3f_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3f_conv_2[0][0]

add_13 (Add) (None, 256, 34, 60) 0 block_3f_bn_2[0][0]
block_3e_relu[0][0]

block_3f_relu (Activation) (None, 256, 34, 60) 0 add_13[0][0]

block_4a_conv_1 (Conv2D) (None, 512, 34, 60) 1180160 block_3f_relu[0][0]

block_4a_bn_1 (BatchNormalizati (None, 512, 34, 60) 2048 block_4a_conv_1[0][0]

block_4a_relu_1 (Activation) (None, 512, 34, 60) 0 block_4a_bn_1[0][0]

block_4a_conv_2 (Conv2D) (None, 512, 34, 60) 2359808 block_4a_relu_1[0][0]

block_4a_conv_shortcut (Conv2D) (None, 512, 34, 60) 131584 block_3f_relu[0][0]

block_4a_bn_2 (BatchNormalizati (None, 512, 34, 60) 2048 block_4a_conv_2[0][0]

block_4a_bn_shortcut (BatchNorm (None, 512, 34, 60) 2048 block_4a_conv_shortcut[0][0]

add_14 (Add) (None, 512, 34, 60) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]

block_4a_relu (Activation) (None, 512, 34, 60) 0 add_14[0][0]

block_4b_conv_1 (Conv2D) (None, 512, 34, 60) 2359808 block_4a_relu[0][0]

block_4b_bn_1 (BatchNormalizati (None, 512, 34, 60) 2048 block_4b_conv_1[0][0]

block_4b_relu_1 (Activation) (None, 512, 34, 60) 0 block_4b_bn_1[0][0]

block_4b_conv_2 (Conv2D) (None, 512, 34, 60) 2359808 block_4b_relu_1[0][0]

block_4b_bn_2 (BatchNormalizati (None, 512, 34, 60) 2048 block_4b_conv_2[0][0]

add_15 (Add) (None, 512, 34, 60) 0 block_4b_bn_2[0][0]
block_4a_relu[0][0]

block_4b_relu (Activation) (None, 512, 34, 60) 0 add_15[0][0]

block_4c_conv_1 (Conv2D) (None, 512, 34, 60) 2359808 block_4b_relu[0][0]

block_4c_bn_1 (BatchNormalizati (None, 512, 34, 60) 2048 block_4c_conv_1[0][0]

block_4c_relu_1 (Activation) (None, 512, 34, 60) 0 block_4c_bn_1[0][0]

block_4c_conv_2 (Conv2D) (None, 512, 34, 60) 2359808 block_4c_relu_1[0][0]

block_4c_bn_2 (BatchNormalizati (None, 512, 34, 60) 2048 block_4c_conv_2[0][0]

add_16 (Add) (None, 512, 34, 60) 0 block_4c_bn_2[0][0]
block_4b_relu[0][0]

block_4c_relu (Activation) (None, 512, 34, 60) 0 add_16[0][0]

output_bbox (Conv2D) (None, 12, 34, 60) 6156 block_4c_relu[0][0]

output_cov (Conv2D) (None, 3, 34, 60) 1539 block_4c_relu[0][0]

Total params: 21,322,319
Trainable params: 21,295,695
Non-trainable params: 26,624

target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
2020-06-01 08:22:23,562 [INFO] iva.detectnet_v2.scripts.train: Found 6434 samples in training set
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
2020-06-01 08:22:30,746 [INFO] iva.detectnet_v2.scripts.train: Found 1047 samples in validation set
/usr/local/bin/tlt-train: line 32: 1564 Illegal instruction (core dumped) tlt-train-g1 ${PYTHON_ARGS[*]}

Morganh · June 1, 2020, 8:56am

Please consider:

If you are using an incompatible type of CPU that the TensorFlow package in TLT container does not supported.
See Core dump Illegal Instruction on detectnet_v2 example
In 2.0 docker, in jupyter notebook, did you run default resnet18 against KITTI dataset successfully?

rajaneconsys · June 1, 2020, 9:29am

No, i am not able to run defualt resnet18 against kitti dataset. That also showing same error illegal instruction(core dumped).

i am having Intel® Core™ i5-3470S CPU and GeForce GT 1030/PCIe/SSE2 GPU
and running Ubuntu 18.04.4 LTS

Morganh · June 1, 2020, 5:49pm

So, please check with the help of Core dump Illegal Instruction on detectnet_v2 example
or please try another host PC.

rajaneconsys · June 4, 2020, 8:13pm

Hi @Morganh
I am able to run training in another pc. i am not getting any error related to illegal instruction.
I did model training with unpruned_peoplenet model with kitti dataset but after training the output looks like:
Validation cost: -0.000010
Mean average_precision (in %): 0.0000

class name average precision (in %)

bag 0
face 0
person 0

precision for every class is 0. can you please help me in that. I am attaching train specification file also.peoplenet_train_resnet34_kitti.txt (4.7 KB)

Morganh · June 5, 2020, 2:03am

@rajaneconsys
For KITTI dataset, there is no bag or face or person class name.
Please check the class name when you run tlt-dataset-convert.

For your reference, you may see as below.

2020-05-05 15:49:56,728 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 1
2020-05-05 15:49:57,488 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 2
2020-05-05 15:49:58,155 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 3
2020-05-05 15:49:58,827 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 4
2020-05-05 15:49:59,531 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 5
2020-05-05 15:50:00,068 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 6
2020-05-05 15:50:00,692 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 7
2020-05-05 15:50:01,305 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 8
2020-05-05 15:50:01,898 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 9
2020-05-05 15:50:02,568 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
cyclist: 265
van: 444
tram: 72
car: 4197
misc: 151
pedestrian: 682
truck: 166
person_sitting: 35
dontcare: 1582

2020-05-05 15:50:02,568 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 0
2020-05-05 15:50:06,224 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 1
2020-05-05 15:50:09,532 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 2
2020-05-05 15:50:13,013 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 3
2020-05-05 15:50:17,280 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 4
2020-05-05 15:50:21,366 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 5
2020-05-05 15:50:25,274 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 6
2020-05-05 15:50:29,012 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 7
2020-05-05 15:50:32,670 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 8
2020-05-05 15:50:36,280 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 9
2020-05-05 15:50:40,081 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
cyclist: 1362
van: 2470
tram: 439
car: 24545
misc: 822
pedestrian: 3805
truck: 928
person_sitting: 187
dontcare: 9713

2020-05-05 15:50:40,081 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Cumulative object statistics
2020-05-05 15:50:40,081 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
cyclist: 1627
van: 2914
tram: 511
car: 28742
misc: 973
pedestrian: 4487
truck: 1094
person_sitting: 222
dontcare: 11295

2020-05-05 15:50:40,081 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Class map.
Label in GT: Label in tfrecords file
Cyclist: cyclist
Van: van
Tram: tram
Car: car
Misc: misc
Pedestrian: pedestrian
Truck: truck
Person_sitting: person_sitting
DontCare: dontcare
For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.

rajaneconsys · June 5, 2020, 5:22am

Thanks @Morganh.

Topic		Replies	Views
Core dump Illegal Instruction on detectnet_v2 example TAO Toolkit	17	2207	October 12, 2021
Get error when training lpdnet with TLT3.0 TAO Toolkit	5	990	October 12, 2021
Problem with training peoplenetv2 TAO Toolkit	5	916	October 12, 2021
Accelerating Peoplnet with tlt for jetson nano TAO Toolkit	19	2647	October 12, 2021
"tlt-train detectnet_v2" lead core dump TAO Toolkit	7	1076	October 12, 2021
Tlt-infer detectnet_v2 fails - TypeError TAO Toolkit	37	1760	October 12, 2021
Error on tlt-training detectnet_v2? TAO Toolkit	6	571	October 12, 2021
Unable to detect object after training TAO Toolkit	25	1253	October 12, 2021
Training with TLT a detectnet_v2 resnet18 pre-trained model failed TAO Toolkit	2	669	October 12, 2021
Error while training on tlt TAO Toolkit	4	777	September 5, 2021

Illegal instruction

Layer (type) Output Shape Param # Connected to

output_cov (Conv2D) (None, 3, 34, 60) 1539 block_4c_relu[0][0]

Related topics