TensorRT 3.0 RC run SSD error in DetectionOutput layer

I’m trying to build the objection detection framework SSD with TensorRT 3.0 plugin api, the DetectionOutput layer was created by the create function in the NVIDIA plugin library createSSDDetectionOutputPlugin. But it does return a error message “Plugin layer output count is not equal to caffe output count”.

This is my DetectionOutput definition.

layer {
  name: "detection_out"
  type: "DetectionOutputPlugin"
  bottom: "mbox_loc"
  bottom: "mbox_conf_flatten"
  bottom: "mbox_priorbox"
  top: "detection_out"
  #include {
  #  phase: TEST
  #detection_output_param {
  #  num_classes: 201
  #  share_location: true
  #  background_label_id: 0
  #  nms_param {
  #    nms_threshold: 0.45
  #    top_k: 400
  #  }
  #  save_output_param {
  #    label_map_file: "./models/SSD_300x300/labelmap_ilsvrc_det.prototxt"
  #  }
  #  code_type: CENTER_SIZE
  #  keep_top_k: 200
  #  confidence_threshold: 0.01

I check the bottom layers “mbox_loc”, "mbox_conf_flatten” and “mbox_priorbox”, all of them are same both in shape and data with caffe.

Here is my code implemented in plugin factory.

// INvPlugin * createSSDDetectionOutputPlugin(DetectionOutputParameters param);
        DetectionOutputParameters det_output_param;
        det_output_param.shareLocation = true;
        det_output_param.varianceEncodedInTarget = false;
        det_output_param.backgroundLabelId = 0;
        det_output_param.numClasses = 201;
        det_output_param.topK = 400;
        det_output_param.keepTopK = 200;
        det_output_param.confidenceThreshold = 0.01;
        det_output_param.nmsThreshold = 0.45;
        det_output_param.codeType = CENTER_SIZE;

        std::unique_ptr<INvPlugin, decltype(nvPluginDeleter)> plugin =
            std::unique_ptr<INvPlugin, decltype(nvPluginDeleter)>(createSSDDetectionOutputPlugin(det_output_param), nvPluginDeleter);

Have you solve this problem? I have this error too, but I check the dimensions of each output layers, and find priorbox layer is 2m1 which in caffe is 12m. So ,maybe this is the problem? Or I am wrong with createSSDpriorbox plugin?

I have come the same problem. Have you solve it yet? I add some layers in PluginFactory such as Reshape, Permute and so on. While I run the net, it returns error message “Plugin layer output count is not equal to caffe output count” after detectiong_out layer with Warning: “Flatten layer ignored. GIE implicitly flattens input to Fully Connected layers, but in other circumstances this will result in undefined behavior.” I did not add Flatten layer in PluginFactory, and I’m not sure whether it is necessary to add Flatten layer in PluginFactory. Did you add Flatten layer in PluginFactory?

Hi, fujiaweigege, I add flatten layer in it by myself. And, I still have no idea with this problem. Sorry.

I have the same problem.@mzchtx @AastaLLL

Hi, were you guys able to solve the problem? If so, how did you do it?


instead of a flatten layer which puts the (2,m,1) into (2m1) did you try making use of a reshape layer ?

layer {
name: “reshape”
type: “Reshape”
bottom: “input”
top: “output”
reshape_param {
shape {
dim: 1 # copy the dimension from below
dim: 2
dim: -1 # infer it from the other dimensions

Hi, @adolf.hohl You should write detection_out layer by yourself, and this can solve this problem.

Hi guys

this problem can be fixed by adding another output for detection_out layer as follows (in deploy.prototxt file)

layer {
  name: "detection_out"
  type: "DetectionOutputPlugin"
  bottom: "mbox_loc"
  bottom: "mbox_conf_flatten"
  bottom: "mbox_priorbox"
  top: "detection_out"
  top: "out2"

for details please refer to this thread


Hi, I tried the fix, and it worked. But I am not sure if I am getting the correct outputs. When I see the number of detections from the detection output layer, I see in Caffe that it depends on if the object is on the screen or not. With TensorRT I always get keep_top_k number of detections. Do you also have the same “problem”?

@marvinreza I guess this is obvious, as in tensorRT one has to pre-specify the output dimensions for a Plugin, so any dynamic change in Output dimensions is not possible

However, you can compare the raw output values of tensorRT with Caffe’s output, for a test image, to check correctness of outputs
I am yet to implement Softmax so i cant give verdicts myself


Because of the softmax in tensorrt only across channel,but in SSD, the axis is 2, so you should implement it by yourself.

And,I do not know what’s problem with the plugin which in TensorRT3.0, So I have written the detector layer by myself and can get right result.

what inference time have you achieved ?? and whats the execution time of your last (detection_out) layer ??

Sorry for the late reply.I guess you use the time function which in samples, am I right?
I test SSD in TensorRT with GTX1060,and get 27ms with 1080p video.
Maybe you can use the other time function to test how long a image or detection inference.Because of my machine is doing the other things.So sorry for I can’t test time cost that time,I will test it at soon.
And you can do what I suggest,maybe you can get the right answer.

In my GTX1060,the last layer only uses 1 or 2 ms.

When I upgrade TensorRT to 3.0.2, it works for me.

Do you mean that you do not need to write two top outputs, or that it works in TRT 3.0.2 when you do write two outputs from the layer?