YOLO TensorRT model for sample_object_detector

HI!
I try to use tensorRT_optimization to generate a tensorRT model from YOLO.
The YOLO’s .prototxt file is from https://github.com/TLESORT/YOLO-TensorRT-GIE- and it’s .caffemodel is converted by using https://github.com/xingwangsfu/caffe-yolo/.
Now I successfully generated a optimized.bin model from tensorRT_optimization. I used the command line:
./tensorRT_optimization --modelType=caffe --prototxt=/path/to/my/prototxt/file.prototxt --caffemodel=/path/to/my/caffemodel/file.caffemodel --outputBlobs=result

Then, I tried to use sample_object_detector with the transfered model optimized.bin.
The command line:
./sample_object_detector --tensorRT_model=/path/to/optimized.bin

However, it shows errors.

mec-lab@meclab-System-Product-Name:~/LBX/driveworks-1.2/bin$ ./sample_object_detector --tensorRT_model=/home/mec-lab/LBX/driveworks-1.2/tools/dnn/optimized.bin
[19-11-2018 12:54:51] Initialize DriveWorks SDK v1.2.400
[19-11-2018 12:54:51] Release build with GNU 4.8.5 from v1.2.0-rc11-0-ga7f5475
[19-11-2018 12:54:51] Platform: Detected Generic x86 Platform
[19-11-2018 12:54:51] TimeSource: monotonic epoch time offset is 1542597839247350
[19-11-2018 12:54:51] Platform: number of GPU devices detected 1
[19-11-2018 12:54:51] Platform: currently selected GPU device discrete ID 0
[19-11-2018 12:54:51] SDK: Resources mounted from .././data/resources
[19-11-2018 12:54:51] SensorFactory::createSensor() -> camera.virtual, tensorRT_model=/home/mec-lab/LBX/driveworks-1.2/tools/dnn/optimized.bin,video=.././data/samples/sfm/triangulation/video_0.h264
[19-11-2018 12:54:51] CameraNVCUVID: no seek table found at .././data/samples/sfm/triangulation/video_0.h264.seek, seeking is not available.
Camera image: 1280x800
Camera image with 1280x800 at 30 FPS
[19-11-2018 12:54:52] Added linear block of size 12845056
[19-11-2018 12:54:52] Added linear block of size 12845056
[19-11-2018 12:54:52] Added linear block of size 12845056
[19-11-2018 12:54:52] Added linear block of size 1605632
[19-11-2018 12:54:53] Driveworks exception thrown: DW_INVALID_ARGUMENT: blobIndex is larger than output binding count

terminate called after throwing an instance of 'std::runtime_error'
  what():  [2018-11-19 12:54:53] DW Error DW_INVALID_ARGUMENT executing DW function:
 dwDNN_getOutputSize(&m_networkOutputDimensions[1], 1U, m_dnn)
 at /builds/driveav/dw/sdk/samples/dnn/sample_object_detector/main.cpp:271
Aborted (core dumped)

By the way, I didn’t train the YOLO model by my own data. Is there something to do with this reasons?
Please give me some hints. Thanks!

Hello, is this specific to the DriveOS platform?

Moving to Drive PX2 for better coverage.

Dear NVES
Yes I moved to DPX2 to generate the YOLO tensorRT model, but it shows the same error.
Just like mentioned here: https://devtalk.nvidia.com/default/topic/1023364/general/how-to-run-sample-code-using-my-trained-data-custom-caffemodel-amp-custom-prototxt-on-host-pc-/

I used the same command for sample_object_detector
./sample_object_detector --tensorRT_model=/path/to/optimized.bin

By the way, may I asked the meaning about “better coverage”?
Very appreciate for reply me!

Hello,

This seems like a DriveOS PX2 question, so moved your topic to the PX2 forum so you’ll get more visibility from PX2 experts, hence “better coverage” for your question.

Dear NVES
OK! I will move my topic to PX2 forum, Very appreciate for your help!

fyi. I already moved your topic. No action required.

Dear imugly1029,
Could you please tell pdk version? We will look into the issue

Dear SivaRamaKrishna
I have NVIDIA DRIVE™ PX 2 AutoChauffeur (P2379) for PDK and NVIDIA DRIVE 5.0.5.0bL for SDK.

Dear imgugly,
It seems the network prototxt are different. Could you please get the caffemodel corresponding to https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/yolo_small_modified.prototxt from the developer and give a try and let us know if you have any issue

Dear SivaRamaKrishna
So it means that I should use the yolo_small_modified.prototxt and my own data to do training to generate my own caffemodel?

However, I have tried the prototxt from https://github.com/xingwangsfu/caffe-yolo/blob/master/prototxt/yolo_small_deploy.prototxt and caffemodel generated from https://github.com/xingwangsfu/caffe-yolo/blob/master/create_yolo_caffemodel.py by using the weights file from
wget http://pjreddie.com/media/files/yolo-small.weights
It still didn’t work. I think it’s because the weights file isn’t corresponding to the prototxt, right?

So I come up an idea. I may use that weights file and generate it’s own prototxt and caffemodel by using https://github.com/xingwangsfu/caffe-yolo/blob/master/create_yolo_prototxt.py and https://github.com/xingwangsfu/caffe-yolo/blob/master/create_yolo_caffemodel.py

I will reply you if I get any issue. Many thanks for your help!

@imugly1029
You are using Driveworks 1.2
If the number of output dimensions is one, you must modify all of the parts of the m_networkOutputDimensions variable to one dimension.
I hope this helps.

Dear bcchoi
Yes I found that problem during tracking the sample_object_detector code. Originally, variable m_networkOutputDimensions is an array with 2 elements.

dwBlobSize m_networkOutputDimensions[2];

So as the m_totalSizesOutput, *m_dnnOutputsDevice and m_dnnOutputsHost

float32_t *m_dnnOutputsDevice[2];
std::unique_ptr<float32_t[]> m_dnnOutputsHost[2];
uint32_t m_totalSizesOutput[2];

As your suggestion, I try to modify them into just one variable.

dwBlobSize m_networkOutputDimensions;
float32_t *m_dnnOutputsDevice;
std::unique_ptr<float32_t[]> m_dnnOutputsHost;
uint32_t m_totalSizesOutput;

But I found something interesting in the last line of function onProcess()

interpretOutput(m_dnnOutputsHost[m_cvgIdx].get(), m_dnnOutputsHost[m_bboxIdx].get(),
                        &m_detectionRegion);

It seems m_dnnOutputsHost was assigned a variable m_cvgIdx and I am not sure about the purpose of that variable. Is it something about confidence score? Also the function interpretOutput needs three input parameters. With your suggestion, I should modify all of them, right?
So I really want to know the purpose of variable m_cvgIdx. Do you have any idea?

Dear SivaRamaKrishna
As I mentioned above, I try to use pre-Trained Darknet19 448x448’s cfg file and weight file from https://pjreddie.com/darknet/imagenet/#extraction to generate it’s own prototxt and caffemodel.
And it can be transfer to tensorRT model successfully by using trnsorRT_optimization.

mec-lab@meclab-System-Product-Name:~/LBX/driveworks-1.2/tools/dnn$ ./tensorRT_optimization --outputBlobs=prob --out=yolo_trt.bin --modelType=caffe --caffemodel=/home/mec-lab/LBX/caffe-yolo/yolo_v1.caffemodel --prototxt=/home/mec-lab/LBX/caffe-yolo/prototxt/yolo_v1.prototxt
Initializing network optimizer on model /home/mec-lab/LBX/caffe-yolo/prototxt/yolo_v1.prototxt with weights from /home/mec-lab/LBX/caffe-yolo/yolo_v1.caffemodel
Input "data": 3x448x448
Output "prob": 1000x1x1
Iteration 0: 5.1801 ms.
Iteration 1: 5.20602 ms.
Iteration 2: 5.21466 ms.
Iteration 3: 5.67603 ms.
Iteration 4: 5.39424 ms.
Iteration 5: 5.22403 ms.
Iteration 6: 5.68115 ms.
Iteration 7: 5.38576 ms.
Iteration 8: 5.20051 ms.
Iteration 9: 5.21114 ms.
Average over 10 runs is 5.33736 ms.

However, the same error show up during run the sample_object_detector.

mec-lab@meclab-System-Product-Name:~/LBX/driveworks-1.2/bin$ ./sample_object_detector --tensorRT_model=/home/mec-lab/LBX/driveworks-1.2/tools/dnn/yolo_trt.bin
[23-11-2018 11:30:15] Initialize DriveWorks SDK v1.2.400
[23-11-2018 11:30:15] Release build with GNU 4.8.5 from v1.2.0-rc11-0-ga7f5475
[23-11-2018 11:30:15] Platform: Detected Generic x86 Platform
[23-11-2018 11:30:15] TimeSource: monotonic epoch time offset is 1542936490120149
[23-11-2018 11:30:15] Platform: number of GPU devices detected 1
[23-11-2018 11:30:15] Platform: currently selected GPU device discrete ID 0
[23-11-2018 11:30:15] SDK: Resources mounted from .././data/resources
[23-11-2018 11:30:15] SensorFactory::createSensor() -> camera.virtual, tensorRT_model=/home/mec-lab/LBX/driveworks-1.2/tools/dnn/yolo_trt.bin,video=.././data/samples/sfm/triangulation/video_0.h264
[23-11-2018 11:30:15] CameraNVCUVID: no seek table found at .././data/samples/sfm/triangulation/video_0.h264.seek, seeking is not available.
Camera image: 1280x800
Camera image with 1280x800 at 30 FPS
[23-11-2018 11:30:15] Added linear block of size 25690112
[23-11-2018 11:30:15] Added linear block of size 6422528
[23-11-2018 11:30:15] Added linear block of size 903168
[23-11-2018 11:30:17] DNN: Missing or incompatible parameter in metadata (ignoreAspectRatio). Parameter is set to default value.
[23-11-2018 11:30:17] DNN: Missing or incompatible parameter in metadata (doPerPlaneMeanNormalization). Parameter is set to default value.
[23-11-2018 11:30:17] DNN: Missing or incompatible parameter in metadata (tonemapType). Parameter is set to default value.
[23-11-2018 11:30:17] Driveworks exception thrown: DW_INVALID_ARGUMENT: blobIndex is larger than output binding count

terminate called after throwing an instance of 'std::runtime_error'
  what():  [2018-11-23 11:30:17] DW Error DW_INVALID_ARGUMENT executing DW function:
 dwDNN_getOutputSize(&m_networkOutputDimensions[1], 1U, m_dnn)
 at /builds/driveav/dw/sdk/samples/dnn/sample_object_detector/main.cpp:271
Aborted (core dumped)

I think it may refer to bcchoi mentioned above…

Dear imugly1029,
The sample_object_detector expects your network to have two outputs coverage and bounding box.
Can you confirm if you have set the correct blob names corresponding to these two. Also, can you double check the network input ans outsizes in the code if they are matching with the prototxt.

Dear imugly1029,
Is this issue resolved?

Dear SivaRamaKrishna
So sorry for replying you so lately. I’m busy on other works so I didn’t try it.
By the way, to confirm the correct two blob names means to confirm them in sample_object_detector?

// Get coverage and bounding box blob indices
const char *coverageBlobName = "coverage";
const char *boundingBoxBlobName = "bboxes";
CHECK_DW_ERROR(dwDNN_getOutputIndex(&m_cvgIdx, coverageBlobName, m_dnn));
CHECK_DW_ERROR(dwDNN_getOutputIndex(&m_bboxIdx, boundingBoxBlobName, m_dnn));

or in the .prototxt file?

And where to check they are matching in input size and output size?

Dear SivaRamaKrishna
Is there any new hint?
I really don’t know how to start the modification.

I solved this problem by using the approach that bcchoi mentioned.
The output blob of the YOLO network I uesd is only one, and it’s size is 1 x 1 x 1470.
However sample_object_detector needs our network to have two output.
So I changed all the array that corresponding to two output to be one.
Moreover, I commented the function nonMaxSuppression and modified function interpretOutput so I can get the data stored in the YOLO’s output blob and use it to draw bounding box.