Detectnet-console failed to profile custom model on TX2

Background:

  • I followed the instruction here: https://github.com/dusty-nv/jetson-inference to install DIGITS, JETPACK and other packages on host and TX2 jetson.
  • I followed instructions https://github.com/NVIDIA/DIGITS/tree/master/examples/object-detection to train detectnet.
  • I trained the original detectnet with Googlenet as convolutional layers. And successfully run the model on TX2 using detectnet-console.
  • In order to make the detectnet run faster, I replaced the Googlenet with AlexNet and trained a customized model.
  • When I run the customized model on TX2 using detectnet-console, I got an error.

Problem: When trying to load my customized detectnet model, I got following Error message:

*** Error in `./detectnet-console': free(): corrupted unsorted chunks: 0x00000000265cd8e0 ***
Aborted (core dumped)

When I print all the debug messages:

detectnet-console
  args (8):  0 [./detectnet-console]  1 [/home/nvidia/Downloads/000021.png]  2 [output1.jpg]  3 [--prototxt=/home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/deploy.prototxt]  4 [--model=/home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/snapshot_iter_500.caffemodel]  5 [--input_blob=data]  6 [--output_cvg=coverage]  7 [--output_bbox=bboxes]  


detectNet -- loading detection network model from:
          -- prototxt    /home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/deploy.prototxt
          -- model       /home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/snapshot_iter_500.caffemodel
          -- input_blob  'data'
          -- output_cvg  'coverage'
          -- output_bbox 'bboxes'
          -- mean_pixel  0.000000
          -- threshold   0.500000
          -- batch_size  2

[GIE]  attempting to open cache file /home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/snapshot_iter_500.caffemodel.2.tensorcache
[GIE]  cache file not found, profiling network model
[GIE]  platform has FP16 support.
[GIE]  loading /home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/deploy.prototxt /home/nvidia/models/detectnet/Alexbn/20171030-204120-b15a_epoch_5.0/snapshot_iter_500.caffemodel
[GIE]  retrieved output tensor 'coverage'
[GIE]  retrieved output tensor 'bboxes'
[GIE]  configuring CUDA engine
[GIE]  building CUDA engine
[GIE]  Original: 25 layers
[GIE]  After dead-layer removal: 25 layers
[GIE]  After scale fusion: 25 layers
[GIE]  Fusing  conv1 with activation relu1
[GIE]  Fusing  conv2 with activation relu2
[GIE]  Fusing  conv3 with activation relu3
[GIE]  Fusing  conv4 with activation relu4
[GIE]  Fusing  conv5 with activation relu5
[GIE]  Fusing  conv-post1 with activation relu-post1
[GIE]  After conv-act fusion: 19 layers
[GIE]  After tensor merging: 19 layers
[GIE]  After concat removal: 19 layers
[GIE]  Region transformed_data: NC2HW_F16
[GIE]  Region bn0: NC2HW_F16
[GIE]  Region conv1: NC2HW_F16
[GIE]  Region pool1: NC2HW_F16
[GIE]  Region bn1: NC2HW_F16
[GIE]  Region conv2: NC2HW_F16
[GIE]  Region pool2: NC2HW_F16
[GIE]  Region bn2: NC2HW_F16
[GIE]  Region conv3: NC2HW_F16
[GIE]  Region bn3: NC2HW_F16
[GIE]  Region conv4: NC2HW_F16
[GIE]  Region bn4: NC2HW_F16
[GIE]  Region conv5: NC2HW_F16
[GIE]  Region pool5: NC2HW_F16
[GIE]  Region conv-post1: NC2HW_F16
[GIE]  Region bn-post1: NC2HW_F16
[GIE]  Region cvg/classifier: NC2HW_F16
[GIE]  Region data: NC2HW_F16
[GIE]  Region transformed_data: NC2HW_F16
[GIE]  Region bn0: NC2HW_F16
[GIE]  Region conv1: NC2HW_F16
[GIE]  Region pool1: NC2HW_F16
[GIE]  Region bn1: NC2HW_F16
[GIE]  Region conv2: NC2HW_F16
[GIE]  Region pool2: NC2HW_F16
[GIE]  Region bn2: NC2HW_F16
[GIE]  Region conv3: NC2HW_F16
[GIE]  Region bn3: NC2HW_F16
[GIE]  Region conv4: NC2HW_F16
[GIE]  Region bn4: NC2HW_F16
[GIE]  Region conv5: NC2HW_F16
[GIE]  Region pool5: NC2HW_F16
[GIE]  Region conv-post1: NC2HW_F16
[GIE]  Region bn-post1: NC2HW_F16
[GIE]  Region cvg/classifier: NC2HW_F16
[GIE]  Region coverage: NC2HW_F16
[GIE]  Region bboxes: NC2HW_F16
[GIE]  
[GIE]  Node deploy_transform: NC2HW_F16
[GIE]  Node bn0: NC2HW_F16
[GIE]  Node conv1 + relu1: NC2HW_F16
[GIE]  Node pool1: NC2HW_F16
[GIE]  Node bn1: NC2HW_F16
[GIE]  Node conv2 + relu2: NC2HW_F16
[GIE]  Node pool2: NC2HW_F16
[GIE]  Node bn2: NC2HW_F16
[GIE]  Node conv3 + relu3: NC2HW_F16
[GIE]  Node bn3: NC2HW_F16
[GIE]  Node conv4 + relu4: NC2HW_F16
[GIE]  Node bn4: NC2HW_F16
[GIE]  Node conv5 + relu5: NC2HW_F16
[GIE]  Node pool5: NC2HW_F16
[GIE]  Node conv-post1 + relu-post1: NC2HW_F16
[GIE]  Node bn-post1: NC2HW_F16
[GIE]  Node cvg/classifier: NC2HW_F16
[GIE]  Node coverage/sig: NC2HW_F16
[GIE]  Node bbox/regressor: NC2HW_F16
[GIE]  
[GIE]  Adding reformat layer: deploy_transform reformatted input 0 (data) from NCHW_F32 to NC2HW_F16
[GIE]  Adding reformat layer: coverage/sig reformatted output 0 (coverage) from NC2HW_F16 to NCHW_F32
[GIE]  Adding reformat layer: bbox/regressor reformatted output 0 (bboxes) from NC2HW_F16 to NCHW_F32
[GIE]  After reformat layers: 22 layers
[GIE]  Block size 524288000
[GIE]  Block size 11501568
[GIE]  Block size 7667712
[GIE]  Block size 7667712
[GIE]  Total Activation Memory: 551124992
[GIE]  
[GIE]  --------------- Timing deploy_transform input reformatter 0(9)
[GIE]  Tactic 0 is the only option, timing skipped
[GIE]  
[GIE]  --------------- Timing deploy_transform(10)
[GIE]  Tactic 0 is the only option, timing skipped
[GIE]  
[GIE]  --------------- Timing bn0(10)
[GIE]  Tactic 0 is the only option, timing skipped
[GIE]  
[GIE]  --------------- Timing conv1 + relu1(3)
*** Error in `./detectnet-console': free(): corrupted unsorted chunks: 0x00000000265cd8e0 ***
Aborted (core dumped)

Then I tried to debug into the program, I found it exit at tensorNet.cpp line 166

nvinfer1::ICudaEngine* engine = builder->buildCudaEngine(*network);

According to the debug information I think the prototxt and model is correctly loaded. The error occured when TensorRT “timing” the conv1+relu1 layer.

You can find all the file needed to analyze and reproduce the problem here:
https://drive.google.com/open?id=1QL2Dm0LHk7ymaUubJ4aYiuT6Py13kk0Y
which including deploy.prototxt and snapshot model.

Please help!
Thank you.

Hi,

Do you also have the BatchNorm layer in the original Googlenet-based detection network?
Batch Normalization needs to be implemented with Scale layer.

By the way, do you use TensorRT 3?
Thanks.

Hi AastaLLL,

Thanks for your reply.

  1. The Googlenet-based network doesn’t have bn layers.

  2. I use TensorRT2.1 installed with Jetpack.

After I post the problem, I also thought the bn layers may cause the problem.
So I deleted all the bn layers in the AlexNet based model, and retrained it.
The result is the same:

detectnet-console
  args (8):  0 [./detectnet-console]  1 [/home/nvidia/Downloads/000021.png]  2 [output1.jpg]  3 [--prototxt=/home/nvidia/models/detectnet/Alexnet/deploy.prototxt]  4 [--model=/home/nvidia/models/detectnet/Alexnet/snapshot_iter_4160.caffemodel]  5 [--input_blob=data]  6 [--output_cvg=coverage]  7 [--output_bbox=bboxes]  


detectNet -- loading detection network model from:
          -- prototxt    /home/nvidia/models/detectnet/Alexnet/deploy.prototxt
          -- model       /home/nvidia/models/detectnet/Alexnet/snapshot_iter_4160.caffemodel
          -- input_blob  'data'
          -- output_cvg  'coverage'
          -- output_bbox 'bboxes'
          -- mean_pixel  0.000000
          -- threshold   0.500000
          -- batch_size  2

[GIE]  attempting to open cache file /home/nvidia/models/detectnet/Alexnet/snapshot_iter_4160.caffemodel.2.tensorcache
[GIE]  cache file not found, profiling network model
[GIE]  platform has FP16 support.
[GIE]  loading /home/nvidia/models/detectnet/Alexnet/deploy.prototxt /home/nvidia/models/detectnet/Alexnet/snapshot_iter_4160.caffemodel
[GIE]  retrieved output tensor 'coverage'
[GIE]  retrieved output tensor 'bboxes'
[GIE]  configuring CUDA engine
[GIE]  building CUDA engine
[GIE]  Original: 19 layers
[GIE]  After dead-layer removal: 19 layers
[GIE]  After scale fusion: 19 layers
[GIE]  Fusing  conv1 with activation relu1
[GIE]  Fusing  conv2 with activation relu2
[GIE]  Fusing  conv3 with activation relu3
[GIE]  Fusing  conv4 with activation relu4
[GIE]  Fusing  conv5 with activation relu5
[GIE]  Fusing  conv-post1 with activation relu-post1
[GIE]  After conv-act fusion: 13 layers
[GIE]  After tensor merging: 13 layers
[GIE]  After concat removal: 13 layers
[GIE]  Region transformed_data: NC2HW_F16
[GIE]  Region conv1: NC2HW_F16
[GIE]  Region pool1: NC2HW_F16
[GIE]  Region conv2: NC2HW_F16
[GIE]  Region pool2: NC2HW_F16
[GIE]  Region conv3: NC2HW_F16
[GIE]  Region conv4: NC2HW_F16
[GIE]  Region conv5: NC2HW_F16
[GIE]  Region pool5: NC2HW_F16
[GIE]  Region conv-post1: NC2HW_F16
[GIE]  Region cvg/classifier: NC2HW_F16
[GIE]  Region data: NC2HW_F16
[GIE]  Region transformed_data: NC2HW_F16
[GIE]  Region conv1: NC2HW_F16
[GIE]  Region pool1: NC2HW_F16
[GIE]  Region conv2: NC2HW_F16
[GIE]  Region pool2: NC2HW_F16
[GIE]  Region conv3: NC2HW_F16
[GIE]  Region conv4: NC2HW_F16
[GIE]  Region conv5: NC2HW_F16
[GIE]  Region pool5: NC2HW_F16
[GIE]  Region conv-post1: NC2HW_F16
[GIE]  Region cvg/classifier: NC2HW_F16
[GIE]  Region coverage: NC2HW_F16
[GIE]  Region bboxes: NC2HW_F16
[GIE]  
[GIE]  Node deploy_transform: NC2HW_F16
[GIE]  Node conv1 + relu1: NC2HW_F16
[GIE]  Node pool1: NC2HW_F16
[GIE]  Node conv2 + relu2: NC2HW_F16
[GIE]  Node pool2: NC2HW_F16
[GIE]  Node conv3 + relu3: NC2HW_F16
[GIE]  Node conv4 + relu4: NC2HW_F16
[GIE]  Node conv5 + relu5: NC2HW_F16
[GIE]  Node pool5: NC2HW_F16
[GIE]  Node conv-post1 + relu-post1: NC2HW_F16
[GIE]  Node cvg/classifier: NC2HW_F16
[GIE]  Node coverage/sig: NC2HW_F16
[GIE]  Node bbox/regressor: NC2HW_F16
[GIE]  
[GIE]  Adding reformat layer: deploy_transform reformatted input 0 (data) from NCHW_F32 to NC2HW_F16
[GIE]  Adding reformat layer: coverage/sig reformatted output 0 (coverage) from NC2HW_F16 to NCHW_F32
[GIE]  Adding reformat layer: bbox/regressor reformatted output 0 (bboxes) from NC2HW_F16 to NCHW_F32
[GIE]  After reformat layers: 16 layers
[GIE]  Block size 524288000
[GIE]  Block size 11501568
[GIE]  Block size 7667712
[GIE]  Block size 3744
[GIE]  Total Activation Memory: 543461024
[GIE]  
[GIE]  --------------- Timing deploy_transform input reformatter 0(9)
[GIE]  Tactic 0 is the only option, timing skipped
[GIE]  
[GIE]  --------------- Timing deploy_transform(10)
[GIE]  Tactic 0 is the only option, timing skipped
[GIE]  
[GIE]  --------------- Timing conv1 + relu1(3)


*** Error in `./detectnet-console': free(): corrupted unsorted chunks: 0x00000000266d6bc0 ***
Aborted (core dumped)

I trained the model using DIGITS, and DIGITS uses nvcaffe as backend.
As far as I know in nvcaffe the batchnorm layer is integrated as a single layer, and I assume the caffeparser in tensorrt supports nvcaffe. Is that correct? Or I still need to use the scale layer?

Ming

Hi,

Thanks for the feedback.
We will test the custom model and update information to you later.

Hi,

After turning-off fp16 mode, we can inference your model without error.
Here is the change for your reference:

diff --git a/detectNet.cpp b/detectNet.cpp
index 824ae98..3cfeb28 100644
--- a/detectNet.cpp
+++ b/detectNet.cpp
@@ -72,7 +72,8 @@ detectNet* detectNet::Create( const char* prototxt, const char* model, float mea
 	printf("          -- batch_size  %u\n\n", maxBatchSize);
 	
 	//net->EnableDebug();
-	
+        net->DisableFP16();
+
 	std::vector<std::string> output_blobs;
 	output_blobs.push_back(coverage_blob);
 	output_blobs.push_back(bbox_blob);

Most common issue is that fp16 doesn’t support odd POOLING layer.
Please check our user guide for the supported layer in detail.

Thanks.

Thank you so much for your help!!!
I’ll read the user guide carefully.