Detectnet-console failed to profile custom model on TX2

MingDu · November 8, 2017, 6:17am

Background:

I followed the instruction here: https://github.com/dusty-nv/jetson-inference to install DIGITS, JETPACK and other packages on host and TX2 jetson.
I followed instructions https://github.com/NVIDIA/DIGITS/tree/master/examples/object-detection to train detectnet.
I trained the original detectnet with Googlenet as convolutional layers. And successfully run the model on TX2 using detectnet-console.
In order to make the detectnet run faster, I replaced the Googlenet with AlexNet and trained a customized model.
When I run the customized model on TX2 using detectnet-console, I got an error.

Problem: When trying to load my customized detectnet model, I got following Error message:

*** Error in `./detectnet-console': free(): corrupted unsorted chunks: 0x00000000265cd8e0 ***
Aborted (core dumped)

When I print all the debug messages:

detectnet-console
  args (8):  0 [./detectnet-console]  1 [/home/nvidia/Downloads/000021.png]  2 [output1.jpg]  3 [--prototxt=/home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/deploy.prototxt]  4 [--model=/home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/snapshot_iter_500.caffemodel]  5 [--input_blob=data]  6 [--output_cvg=coverage]  7 [--output_bbox=bboxes]  


detectNet -- loading detection network model from:
          -- prototxt    /home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/deploy.prototxt
          -- model       /home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/snapshot_iter_500.caffemodel
          -- input_blob  'data'
          -- output_cvg  'coverage'
          -- output_bbox 'bboxes'
          -- mean_pixel  0.000000
          -- threshold   0.500000
          -- batch_size  2

[GIE]  attempting to open cache file /home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/snapshot_iter_500.caffemodel.2.tensorcache
[GIE]  cache file not found, profiling network model
[GIE]  platform has FP16 support.
[GIE]  loading /home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/deploy.prototxt /home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/snapshot_iter_500.caffemodel
[GIE]  retrieved output tensor 'coverage'
[GIE]  retrieved output tensor 'bboxes'
[GIE]  configuring CUDA engine
[GIE]  building CUDA engine
[GIE]  Original: 25 layers
[GIE]  After dead-layer removal: 25 layers
[GIE]  After scale fusion: 25 layers
[GIE]  Fusing  conv1 with activation relu1
[GIE]  Fusing  conv2 with activation relu2
[GIE]  Fusing  conv3 with activation relu3
[GIE]  Fusing  conv4 with activation relu4
[GIE]  Fusing  conv5 with activation relu5
[GIE]  Fusing  conv-post1 with activation relu-post1
[GIE]  After conv-act fusion: 19 layers
[GIE]  After tensor merging: 19 layers
[GIE]  After concat removal: 19 layers
[GIE]  Region transformed_data: NC2HW_F16
[GIE]  Region bn0: NC2HW_F16
[GIE]  Region conv1: NC2HW_F16
[GIE]  Region pool1: NC2HW_F16
[GIE]  Region bn1: NC2HW_F16
[GIE]  Region conv2: NC2HW_F16
[GIE]  Region pool2: NC2HW_F16
[GIE]  Region bn2: NC2HW_F16
[GIE]  Region conv3: NC2HW_F16
[GIE]  Region bn3: NC2HW_F16
[GIE]  Region conv4: NC2HW_F16
[GIE]  Region bn4: NC2HW_F16
[GIE]  Region conv5: NC2HW_F16
[GIE]  Region pool5: NC2HW_F16
[GIE]  Region conv-post1: NC2HW_F16
[GIE]  Region bn-post1: NC2HW_F16
[GIE]  Region cvg/classifier: NC2HW_F16
[GIE]  Region data: NC2HW_F16
[GIE]  Region transformed_data: NC2HW_F16
[GIE]  Region bn0: NC2HW_F16
[GIE]  Region conv1: NC2HW_F16
[GIE]  Region pool1: NC2HW_F16
[GIE]  Region bn1: NC2HW_F16
[GIE]  Region conv2: NC2HW_F16
[GIE]  Region pool2: NC2HW_F16
[GIE]  Region bn2: NC2HW_F16
[GIE]  Region conv3: NC2HW_F16
[GIE]  Region bn3: NC2HW_F16
[GIE]  Region conv4: NC2HW_F16
[GIE]  Region bn4: NC2HW_F16
[GIE]  Region conv5: NC2HW_F16
[GIE]  Region pool5: NC2HW_F16
[GIE]  Region conv-post1: NC2HW_F16
[GIE]  Region bn-post1: NC2HW_F16
[GIE]  Region cvg/classifier: NC2HW_F16
[GIE]  Region coverage: NC2HW_F16
[GIE]  Region bboxes: NC2HW_F16
[GIE]  
[GIE]  Node deploy_transform: NC2HW_F16
[GIE]  Node bn0: NC2HW_F16
[GIE]  Node conv1 + relu1: NC2HW_F16
[GIE]  Node pool1: NC2HW_F16
[GIE]  Node bn1: NC2HW_F16
[GIE]  Node conv2 + relu2: NC2HW_F16
[GIE]  Node pool2: NC2HW_F16
[GIE]  Node bn2: NC2HW_F16
[GIE]  Node conv3 + relu3: NC2HW_F16
[GIE]  Node bn3: NC2HW_F16
[GIE]  Node conv4 + relu4: NC2HW_F16
[GIE]  Node bn4: NC2HW_F16
[GIE]  Node conv5 + relu5: NC2HW_F16
[GIE]  Node pool5: NC2HW_F16
[GIE]  Node conv-post1 + relu-post1: NC2HW_F16
[GIE]  Node bn-post1: NC2HW_F16
[GIE]  Node cvg/classifier: NC2HW_F16
[GIE]  Node coverage/sig: NC2HW_F16
[GIE]  Node bbox/regressor: NC2HW_F16
[GIE]  
[GIE]  Adding reformat layer: deploy_transform reformatted input 0 (data) from NCHW_F32 to NC2HW_F16
[GIE]  Adding reformat layer: coverage/sig reformatted output 0 (coverage) from NC2HW_F16 to NCHW_F32
[GIE]  Adding reformat layer: bbox/regressor reformatted output 0 (bboxes) from NC2HW_F16 to NCHW_F32
[GIE]  After reformat layers: 22 layers
[GIE]  Block size 524288000
[GIE]  Block size 11501568
[GIE]  Block size 7667712
[GIE]  Block size 7667712
[GIE]  Total Activation Memory: 551124992
[GIE]  
[GIE]  --------------- Timing deploy_transform input reformatter 0(9)
[GIE]  Tactic 0 is the only option, timing skipped
[GIE]  
[GIE]  --------------- Timing deploy_transform(10)
[GIE]  Tactic 0 is the only option, timing skipped
[GIE]  
[GIE]  --------------- Timing bn0(10)
[GIE]  Tactic 0 is the only option, timing skipped
[GIE]  
[GIE]  --------------- Timing conv1 + relu1(3)
*** Error in `./detectnet-console': free(): corrupted unsorted chunks: 0x00000000265cd8e0 ***
Aborted (core dumped)

Then I tried to debug into the program, I found it exit at tensorNet.cpp line 166

nvinfer1::ICudaEngine* engine = builder->buildCudaEngine(*network);

According to the debug information I think the prototxt and model is correctly loaded. The error occured when TensorRT “timing” the conv1+relu1 layer.

You can find all the file needed to analyze and reproduce the problem here:
https://drive.google.com/open?id=1QL2Dm0LHk7ymaUubJ4aYiuT6Py13kk0Y
which including deploy.prototxt and snapshot model.

Please help!
Thank you.

AastaLLL · November 9, 2017, 2:50am

Hi,

Do you also have the BatchNorm layer in the original Googlenet-based detection network?
Batch Normalization needs to be implemented with Scale layer.

By the way, do you use TensorRT 3?
Thanks.

MingDu · November 9, 2017, 3:33am

Hi AastaLLL,

Thanks for your reply.

The Googlenet-based network doesn’t have bn layers.
I use TensorRT2.1 installed with Jetpack.

After I post the problem, I also thought the bn layers may cause the problem.
So I deleted all the bn layers in the AlexNet based model, and retrained it.
The result is the same:

detectnet-console
  args (8):  0 [./detectnet-console]  1 [/home/nvidia/Downloads/000021.png]  2 [output1.jpg]  3 [--prototxt=/home/nvidia/models/detectnet/Alexnet/deploy.prototxt]  4 [--model=/home/nvidia/models/detectnet/Alexnet/snapshot_iter_4160.caffemodel]  5 [--input_blob=data]  6 [--output_cvg=coverage]  7 [--output_bbox=bboxes]  


detectNet -- loading detection network model from:
          -- prototxt    /home/nvidia/models/detectnet/Alexnet/deploy.prototxt
          -- model       /home/nvidia/models/detectnet/Alexnet/snapshot_iter_4160.caffemodel
          -- input_blob  'data'
          -- output_cvg  'coverage'
          -- output_bbox 'bboxes'
          -- mean_pixel  0.000000
          -- threshold   0.500000
          -- batch_size  2

[GIE]  attempting to open cache file /home/nvidia/models/detectnet/Alexnet/snapshot_iter_4160.caffemodel.2.tensorcache
[GIE]  cache file not found, profiling network model
[GIE]  platform has FP16 support.
[GIE]  loading /home/nvidia/models/detectnet/Alexnet/deploy.prototxt /home/nvidia/models/detectnet/Alexnet/snapshot_iter_4160.caffemodel
[GIE]  retrieved output tensor 'coverage'
[GIE]  retrieved output tensor 'bboxes'
[GIE]  configuring CUDA engine
[GIE]  building CUDA engine
[GIE]  Original: 19 layers
[GIE]  After dead-layer removal: 19 layers
[GIE]  After scale fusion: 19 layers
[GIE]  Fusing  conv1 with activation relu1
[GIE]  Fusing  conv2 with activation relu2
[GIE]  Fusing  conv3 with activation relu3
[GIE]  Fusing  conv4 with activation relu4
[GIE]  Fusing  conv5 with activation relu5
[GIE]  Fusing  conv-post1 with activation relu-post1
[GIE]  After conv-act fusion: 13 layers
[GIE]  After tensor merging: 13 layers
[GIE]  After concat removal: 13 layers
[GIE]  Region transformed_data: NC2HW_F16
[GIE]  Region conv1: NC2HW_F16
[GIE]  Region pool1: NC2HW_F16
[GIE]  Region conv2: NC2HW_F16
[GIE]  Region pool2: NC2HW_F16
[GIE]  Region conv3: NC2HW_F16
[GIE]  Region conv4: NC2HW_F16
[GIE]  Region conv5: NC2HW_F16
[GIE]  Region pool5: NC2HW_F16
[GIE]  Region conv-post1: NC2HW_F16
[GIE]  Region cvg/classifier: NC2HW_F16
[GIE]  Region data: NC2HW_F16
[GIE]  Region transformed_data: NC2HW_F16
[GIE]  Region conv1: NC2HW_F16
[GIE]  Region pool1: NC2HW_F16
[GIE]  Region conv2: NC2HW_F16
[GIE]  Region pool2: NC2HW_F16
[GIE]  Region conv3: NC2HW_F16
[GIE]  Region conv4: NC2HW_F16
[GIE]  Region conv5: NC2HW_F16
[GIE]  Region pool5: NC2HW_F16
[GIE]  Region conv-post1: NC2HW_F16
[GIE]  Region cvg/classifier: NC2HW_F16
[GIE]  Region coverage: NC2HW_F16
[GIE]  Region bboxes: NC2HW_F16
[GIE]  
[GIE]  Node deploy_transform: NC2HW_F16
[GIE]  Node conv1 + relu1: NC2HW_F16
[GIE]  Node pool1: NC2HW_F16
[GIE]  Node conv2 + relu2: NC2HW_F16
[GIE]  Node pool2: NC2HW_F16
[GIE]  Node conv3 + relu3: NC2HW_F16
[GIE]  Node conv4 + relu4: NC2HW_F16
[GIE]  Node conv5 + relu5: NC2HW_F16
[GIE]  Node pool5: NC2HW_F16
[GIE]  Node conv-post1 + relu-post1: NC2HW_F16
[GIE]  Node cvg/classifier: NC2HW_F16
[GIE]  Node coverage/sig: NC2HW_F16
[GIE]  Node bbox/regressor: NC2HW_F16
[GIE]  
[GIE]  Adding reformat layer: deploy_transform reformatted input 0 (data) from NCHW_F32 to NC2HW_F16
[GIE]  Adding reformat layer: coverage/sig reformatted output 0 (coverage) from NC2HW_F16 to NCHW_F32
[GIE]  Adding reformat layer: bbox/regressor reformatted output 0 (bboxes) from NC2HW_F16 to NCHW_F32
[GIE]  After reformat layers: 16 layers
[GIE]  Block size 524288000
[GIE]  Block size 11501568
[GIE]  Block size 7667712
[GIE]  Block size 3744
[GIE]  Total Activation Memory: 543461024
[GIE]  
[GIE]  --------------- Timing deploy_transform input reformatter 0(9)
[GIE]  Tactic 0 is the only option, timing skipped
[GIE]  
[GIE]  --------------- Timing deploy_transform(10)
[GIE]  Tactic 0 is the only option, timing skipped
[GIE]  
[GIE]  --------------- Timing conv1 + relu1(3)


*** Error in `./detectnet-console': free(): corrupted unsorted chunks: 0x00000000266d6bc0 ***
Aborted (core dumped)

I trained the model using DIGITS, and DIGITS uses nvcaffe as backend.
As far as I know in nvcaffe the batchnorm layer is integrated as a single layer, and I assume the caffeparser in tensorrt supports nvcaffe. Is that correct? Or I still need to use the scale layer?

Ming

AastaLLL · November 10, 2017, 2:03am

Hi,

Thanks for the feedback.
We will test the custom model and update information to you later.

AastaLLL · November 10, 2017, 8:17am

Hi,

After turning-off fp16 mode, we can inference your model without error.
Here is the change for your reference:

diff --git a/detectNet.cpp b/detectNet.cpp
index 824ae98..3cfeb28 100644
--- a/detectNet.cpp
+++ b/detectNet.cpp
@@ -72,7 +72,8 @@ detectNet* detectNet::Create( const char* prototxt, const char* model, float mea
 	printf("          -- batch_size  %u\n\n", maxBatchSize);
 	
 	//net->EnableDebug();
-	
+        net->DisableFP16();
+
 	std::vector<std::string> output_blobs;
 	output_blobs.push_back(coverage_blob);
 	output_blobs.push_back(bbox_blob);

Most common issue is that fp16 doesn’t support odd POOLING layer.
Please check our user guide for the supported layer in detail.

Thanks.

MingDu · November 10, 2017, 9:37am

Thank you so much for your help!!!
I’ll read the user guide carefully.

Topic		Replies	Views
Detectnet time delay with Digits trained model. Suggestions or it is what it is? Jetson TX2	7	966	October 18, 2021
DetectNet Tutorial Problems Jetson TX2	12	1050	October 18, 2021
How to build the objection detection framework SSD with tensorRT on tx2? Jetson TX2	96	21938	February 21, 2018
Pretrained Models for detectnet - Vehicles Jetson TX2	19	6177	October 18, 2021
[Xavier] cannot use my own trained model on jetson-inference Jetson AGX Xavier	3	707	October 18, 2021
Jetson Inference DetectNet Problems Jetson Nano tensorrt , jetson-inference , nvbugs	17	2682	October 15, 2021
Create Object Detection Model without DIGITS? Jetson TX2	25	3302	October 18, 2021
ONNX model with Jetson-Inference using GPU Jetson Xavier NX tensorrt , jetson-inference , onnx	38	5675	October 18, 2021
Failed to load custom model on Jetson TX2 Jetson TX2	7	1468	October 18, 2021
Converting Caffe model to TensorRT Jetson TX2	33	11523	October 18, 2021

Detectnet-console failed to profile custom model on TX2

Related topics