I'm currently working on TensorRT 3.0 accelerating of VGG16-SSD. I compare every output of each layer with that of caffe version of VGG16-SSD (by comparing firt 100 output of the layer outputdata) and all is OK before Softmax layer. The output of Softmax is wrong, as even it can't sum up to 1 of the 21 float values. I guess it may be some thing wrong with the reshape plugin layer mbox_conf_reshape layer, which is right before the Softmax, but problem still whatever layer output shape i adjust. I debug for two days and read the caffe softmax cpp/cu code agin but find nothing to help. Can ur team give some advices?