Background:
- I followed the instruction here: https://github.com/dusty-nv/jetson-inference to install DIGITS, JETPACK and other packages on host and TX2 jetson.
- I followed instructions https://github.com/NVIDIA/DIGITS/tree/master/examples/object-detection to train detectnet.
- I trained the original detectnet with Googlenet as convolutional layers. And successfully run the model on TX2 using detectnet-console.
- In order to make the detectnet run faster, I replaced the Googlenet with AlexNet and trained a customized model.
- When I run the customized model on TX2 using detectnet-console, I got an error.
Problem: When trying to load my customized detectnet model, I got following Error message:
*** Error in `./detectnet-console': free(): corrupted unsorted chunks: 0x00000000265cd8e0 ***
Aborted (core dumped)
When I print all the debug messages:
detectnet-console
args (8): 0 [./detectnet-console] 1 [/home/nvidia/Downloads/000021.png] 2 [output1.jpg] 3 [--prototxt=/home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/deploy.prototxt] 4 [--model=/home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/snapshot_iter_500.caffemodel] 5 [--input_blob=data] 6 [--output_cvg=coverage] 7 [--output_bbox=bboxes]
detectNet -- loading detection network model from:
-- prototxt /home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/deploy.prototxt
-- model /home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/snapshot_iter_500.caffemodel
-- input_blob 'data'
-- output_cvg 'coverage'
-- output_bbox 'bboxes'
-- mean_pixel 0.000000
-- threshold 0.500000
-- batch_size 2
[GIE] attempting to open cache file /home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/snapshot_iter_500.caffemodel.2.tensorcache
[GIE] cache file not found, profiling network model
[GIE] platform has FP16 support.
[GIE] loading /home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/deploy.prototxt /home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/snapshot_iter_500.caffemodel
[GIE] retrieved output tensor 'coverage'
[GIE] retrieved output tensor 'bboxes'
[GIE] configuring CUDA engine
[GIE] building CUDA engine
[GIE] Original: 25 layers
[GIE] After dead-layer removal: 25 layers
[GIE] After scale fusion: 25 layers
[GIE] Fusing conv1 with activation relu1
[GIE] Fusing conv2 with activation relu2
[GIE] Fusing conv3 with activation relu3
[GIE] Fusing conv4 with activation relu4
[GIE] Fusing conv5 with activation relu5
[GIE] Fusing conv-post1 with activation relu-post1
[GIE] After conv-act fusion: 19 layers
[GIE] After tensor merging: 19 layers
[GIE] After concat removal: 19 layers
[GIE] Region transformed_data: NC2HW_F16
[GIE] Region bn0: NC2HW_F16
[GIE] Region conv1: NC2HW_F16
[GIE] Region pool1: NC2HW_F16
[GIE] Region bn1: NC2HW_F16
[GIE] Region conv2: NC2HW_F16
[GIE] Region pool2: NC2HW_F16
[GIE] Region bn2: NC2HW_F16
[GIE] Region conv3: NC2HW_F16
[GIE] Region bn3: NC2HW_F16
[GIE] Region conv4: NC2HW_F16
[GIE] Region bn4: NC2HW_F16
[GIE] Region conv5: NC2HW_F16
[GIE] Region pool5: NC2HW_F16
[GIE] Region conv-post1: NC2HW_F16
[GIE] Region bn-post1: NC2HW_F16
[GIE] Region cvg/classifier: NC2HW_F16
[GIE] Region data: NC2HW_F16
[GIE] Region transformed_data: NC2HW_F16
[GIE] Region bn0: NC2HW_F16
[GIE] Region conv1: NC2HW_F16
[GIE] Region pool1: NC2HW_F16
[GIE] Region bn1: NC2HW_F16
[GIE] Region conv2: NC2HW_F16
[GIE] Region pool2: NC2HW_F16
[GIE] Region bn2: NC2HW_F16
[GIE] Region conv3: NC2HW_F16
[GIE] Region bn3: NC2HW_F16
[GIE] Region conv4: NC2HW_F16
[GIE] Region bn4: NC2HW_F16
[GIE] Region conv5: NC2HW_F16
[GIE] Region pool5: NC2HW_F16
[GIE] Region conv-post1: NC2HW_F16
[GIE] Region bn-post1: NC2HW_F16
[GIE] Region cvg/classifier: NC2HW_F16
[GIE] Region coverage: NC2HW_F16
[GIE] Region bboxes: NC2HW_F16
[GIE]
[GIE] Node deploy_transform: NC2HW_F16
[GIE] Node bn0: NC2HW_F16
[GIE] Node conv1 + relu1: NC2HW_F16
[GIE] Node pool1: NC2HW_F16
[GIE] Node bn1: NC2HW_F16
[GIE] Node conv2 + relu2: NC2HW_F16
[GIE] Node pool2: NC2HW_F16
[GIE] Node bn2: NC2HW_F16
[GIE] Node conv3 + relu3: NC2HW_F16
[GIE] Node bn3: NC2HW_F16
[GIE] Node conv4 + relu4: NC2HW_F16
[GIE] Node bn4: NC2HW_F16
[GIE] Node conv5 + relu5: NC2HW_F16
[GIE] Node pool5: NC2HW_F16
[GIE] Node conv-post1 + relu-post1: NC2HW_F16
[GIE] Node bn-post1: NC2HW_F16
[GIE] Node cvg/classifier: NC2HW_F16
[GIE] Node coverage/sig: NC2HW_F16
[GIE] Node bbox/regressor: NC2HW_F16
[GIE]
[GIE] Adding reformat layer: deploy_transform reformatted input 0 (data) from NCHW_F32 to NC2HW_F16
[GIE] Adding reformat layer: coverage/sig reformatted output 0 (coverage) from NC2HW_F16 to NCHW_F32
[GIE] Adding reformat layer: bbox/regressor reformatted output 0 (bboxes) from NC2HW_F16 to NCHW_F32
[GIE] After reformat layers: 22 layers
[GIE] Block size 524288000
[GIE] Block size 11501568
[GIE] Block size 7667712
[GIE] Block size 7667712
[GIE] Total Activation Memory: 551124992
[GIE]
[GIE] --------------- Timing deploy_transform input reformatter 0(9)
[GIE] Tactic 0 is the only option, timing skipped
[GIE]
[GIE] --------------- Timing deploy_transform(10)
[GIE] Tactic 0 is the only option, timing skipped
[GIE]
[GIE] --------------- Timing bn0(10)
[GIE] Tactic 0 is the only option, timing skipped
[GIE]
[GIE] --------------- Timing conv1 + relu1(3)
*** Error in `./detectnet-console': free(): corrupted unsorted chunks: 0x00000000265cd8e0 ***
Aborted (core dumped)
Then I tried to debug into the program, I found it exit at tensorNet.cpp line 166
nvinfer1::ICudaEngine* engine = builder->buildCudaEngine(*network);
According to the debug information I think the prototxt and model is correctly loaded. The error occured when TensorRT “timing” the conv1+relu1 layer.
You can find all the file needed to analyze and reproduce the problem here:
https://drive.google.com/open?id=1QL2Dm0LHk7ymaUubJ4aYiuT6Py13kk0Y
which including deploy.prototxt and snapshot model.
Please help!
Thank you.