TensorRT YOLO inference error

TLESORT · January 27, 2017, 2:46pm

Hi,

I am currently trying to move some code from caffe framework to tensorRT (GIE) on JetsonTX1.

I installed the “JetPack 2.3.1 - L4T R24.2.1 released for Jetson TX1” and everything seems to be ok for TensorRT.

The code from https://github.com/dusty-nv/jetson-inference which use TensorRT works on the cards.

and the gie_sample “sampleMNISTGIE” from tensorRT package “nv-gie-repo-ubuntu1404-6-rc-cuda8.0_1.0.3-1_amd64.deb” also works.

I am now trying to make a neural net works with tensorRT : YOLO
the prototxt of the network is the following one : https://github.com/xingwangsfu/caffe-yolo/blob/master/prototxt/yolo_small_deploy.prototxt

I think that everything in this network is compatible with TensorRT.

I also have the associate caffemodel file to perform detection on images and everything together works, with the caffe framework.

With TensorRT I don’t have any errors but the output of the neural network is wrong.
The output of the very same image with caffe and tensorRT gives two output completely different.

Here is the code I use :

[code]

IBuilder* builder = createInferBuilder(gLogger);
const char* prototxt="yolo_small_deploy.prototxt";
const char* caffemodel="yolo_small.caffemodel";

// parse the caffe model to populate the network, then set the outputs and create an engine
//ICudaEngine* engine = createMNISTEngine(maxBatchSize, builder, DataType::kFLOAT);
INetworkDefinition* network = builder->createNetwork();
ICaffeParser *parser = createCaffeParser();
const IBlobNameToTensor *blobNameToTensor =parser->parse(prototxt,     // caffe deploy file
                             caffemodel,     // caffe model file
                             *network,              // network definition that parser populate
                             DataType::kFLOAT);

assert(blobNameToTensor != nullptr);
// the caffe file has no notion of outputs
// so we need to manually say which tensors the engine should generate
network->markOutput(*blobNameToTensor->find(OUTPUT_BLOB_NAME));
// Build the engine
builder->setMaxBatchSize(1);
builder->setMaxWorkspaceSize(16 << 20);//WORKSPACE_SIZE);

// Eliminate the side-effect from the delay of GPU frequency boost
builder->setMinFindIterations(3);
builder->setAverageFindIterations(2);

//build
ICudaEngine *engine = builder->buildCudaEngine(*network);

IExecutionContext *context = engine->createExecutionContext();

// run inference
float prob[OUTPUT_SIZE];

// input and output buffer pointers that we pass to the engine - the engine requires exactly IEngine::getNbBindings(),
// of these, but in this case we know that there is exactly one input and one output.
assert(engine->getNbBindings() == 2);
void* buffers[2];

// In order to bind the buffers, we need to know the names of the input and output tensors.
// note that indices are guaranteed to be less than IEngine::getNbBindings()
int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME); 
int   outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);

// create GPU buffers and a stream
CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE *3* INPUT_H * INPUT_W * sizeof(float)));
CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(float)));
cudaStream_t stream;
CHECK(cudaStreamCreate(&stream));
// DMA the input to the GPU,  execute the batch asynchronously, and DMA it back:
CHECK(cudaMemcpyAsync(buffers[inputIndex], mInputCPU[0], BATCH_SIZE *3* INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));
context->enqueue(BATCH_SIZE, buffers, stream, nullptr);
CHECK(cudaMemcpyAsync(prob, buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE*sizeof(float), cudaMemcpyDeviceToHost, stream));
cudaStreamSynchronize(stream);

// release the stream and the buffers
cudaStreamDestroy(stream);
CHECK(cudaFree(buffers[inputIndex]));
CHECK(cudaFree(buffers[outputIndex]));

// destroy the engine
context->destroy();
engine->destroy();

[\code]

The ouputs of the neural network with tensorRT are similar for different images here are the results for a cat and for a matrice of zeros.

https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/Images/cat_detection.jpg
https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/Images/zeros_detection.jpg

The detection done when the neural network is run with caffe :

https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/Images/true_detection.jpg

The complete code is on :

Do you have any Idea on what could possible go wrong.
It seems like there is a error in the conversion of the caffemodel which make the result wrong.
Thank you for your help :)

AastaLLL · February 16, 2017, 2:40am

Hi,

Thanks for your question. We are investigating this issue now and will update to you later.

AastaLLL · February 17, 2017, 6:32am

Hi,

Thanks for the question and sorry for our late reply.
We have clarified that this difference is caused by unsupported leaky relu layer of tensorRT.

Here is a WAR that use standard-relu+scale+eltwise to approximate leaky relu.
Results works with tuning leaky parameter to 0.08.

Could you give it a try?
Please remember to change threshold back to 0.2.

name: "YOLONet"
input: "data"
input_shape {
  dim: 1
  dim: 3
  dim: 448
  dim: 448
}

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 64
    kernel_size: 7
    pad: 3
    stride: 2
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "relu1"	
}
layer {
  name: "scale1"
  type: "Power"
  bottom: "conv1"
  top: "scale1"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise1"
  type: "Eltwise"
  bottom: "relu1"
  bottom: "scale1"
  top: "layer1"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "layer1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

layer{
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  convolution_param {
    num_output: 192
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "relu2"	
}
layer {
  name: "scale2"
  type: "Power"
  bottom: "conv2"
  top: "scale2"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise2"
  type: "Eltwise"
  bottom: "relu2"
  bottom: "scale2"
  top: "layer2"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "layer2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

layer{
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  convolution_param {
    num_output: 128
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "relu3"	
}
layer {
  name: "scale3"
  type: "Power"
  bottom: "conv3"
  top: "scale3"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise3"
  type: "Eltwise"
  bottom: "relu3"
  bottom: "scale3"
  top: "layer3"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "conv4"
  type: "Convolution"
  bottom: "layer3"
  top: "conv4"
  convolution_param {
    num_output: 256
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "relu4"	
}
layer {
  name: "scale4"
  type: "Power"
  bottom: "conv4"
  top: "scale4"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise4"
  type: "Eltwise"
  bottom: "relu4"
  bottom: "scale4"
  top: "layer4"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv5"
  type: "Convolution"
  bottom: "layer4"
  top: "conv5"
  convolution_param {
    num_output: 256
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "relu5"	
}
layer {
  name: "scale5"
  type: "Power"
  bottom: "conv5"
  top: "scale5"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise5"
  type: "Eltwise"
  bottom: "relu5"
  bottom: "scale5"
  top: "layer5"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv6"
  type: "Convolution"
  bottom: "layer5"
  top: "conv6"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "conv6"
  top: "relu6"	
}
layer {
  name: "scale6"
  type: "Power"
  bottom: "conv6"
  top: "scale6"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise6"
  type: "Eltwise"
  bottom: "relu6"
  bottom: "scale6"
  top: "layer6"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "pool6"
  type: "Pooling"
  bottom: "layer6"
  top: "pool6"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

layer{
  name: "conv7"
  type: "Convolution"
  bottom: "pool6"
  top: "conv7"
  convolution_param {
    num_output: 256
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "conv7"
  top: "relu7"	
}
layer {
  name: "scale7"
  type: "Power"
  bottom: "conv7"
  top: "scale7"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise7"
  type: "Eltwise"
  bottom: "relu7"
  bottom: "scale7"
  top: "layer7"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv8"
  type: "Convolution"
  bottom: "layer7"
  top: "conv8"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu8"
  type: "ReLU"
  bottom: "conv8"
  top: "relu8"	
}
layer {
  name: "scale8"
  type: "Power"
  bottom: "conv8"
  top: "scale8"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise8"
  type: "Eltwise"
  bottom: "relu8"
  bottom: "scale8"
  top: "layer8"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv9"
  type: "Convolution"
  bottom: "layer8"
  top: "conv9"
  convolution_param {
    num_output: 256
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu9"
  type: "ReLU"
  bottom: "conv9"
  top: "relu9"	
}
layer {
  name: "scale9"
  type: "Power"
  bottom: "conv9"
  top: "scale9"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise9"
  type: "Eltwise"
  bottom: "relu9"
  bottom: "scale9"
  top: "layer9"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv10"
  type: "Convolution"
  bottom: "layer9"
  top: "conv10"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu10"
  type: "ReLU"
  bottom: "conv10"
  top: "relu10"	
}
layer {
  name: "scale10"
  type: "Power"
  bottom: "conv10"
  top: "scale10"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise10"
  type: "Eltwise"
  bottom: "relu10"
  bottom: "scale10"
  top: "layer10"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv11"
  type: "Convolution"
  bottom: "layer10"
  top: "conv11"
  convolution_param {
    num_output: 256
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu11"
  type: "ReLU"
  bottom: "conv11"
  top: "relu11"	
}
layer {
  name: "scale11"
  type: "Power"
  bottom: "conv11"
  top: "scale11"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise11"
  type: "Eltwise"
  bottom: "relu11"
  bottom: "scale11"
  top: "layer11"
  eltwise_param {
    operation: SUM
  }
}
layer{
  name: "conv12"
  type: "Convolution"
  bottom: "layer11"
  top: "conv12"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu12"
  type: "ReLU"
  bottom: "conv12"
  top: "relu12"	
}
layer {
  name: "scale12"
  type: "Power"
  bottom: "conv12"
  top: "scale12"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise12"
  type: "Eltwise"
  bottom: "relu12"
  bottom: "scale12"
  top: "layer12"
  eltwise_param {
    operation: SUM
  }
}
layer{
  name: "conv13"
  type: "Convolution"
  bottom: "layer12"
  top: "conv13"
  convolution_param {
    num_output: 256
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu13"
  type: "ReLU"
  bottom: "conv13"
  top: "relu13"	
}
layer {
  name: "scale13"
  type: "Power"
  bottom: "conv13"
  top: "scale13"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise13"
  type: "Eltwise"
  bottom: "relu13"
  bottom: "scale13"
  top: "layer13"
  eltwise_param {
    operation: SUM
  }
}
layer{
  name: "conv14"
  type: "Convolution"
  bottom: "layer13"
  top: "conv14"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu14"
  type: "ReLU"
  bottom: "conv14"
  top: "relu14"	
}
layer {
  name: "scale14"
  type: "Power"
  bottom: "conv14"
  top: "scale14"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise14"
  type: "Eltwise"
  bottom: "relu14"
  bottom: "scale14"
  top: "layer14"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv15"
  type: "Convolution"
  bottom: "layer14"
  top: "conv15"
  convolution_param {
    num_output: 512
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu15"
  type: "ReLU"
  bottom: "conv15"
  top: "relu15"	
}
layer {
  name: "scale15"
  type: "Power"
  bottom: "conv15"
  top: "scale15"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise15"
  type: "Eltwise"
  bottom: "relu15"
  bottom: "scale15"
  top: "layer15"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "conv16"
  type: "Convolution"
  bottom: "layer15"
  top: "conv16"
  convolution_param {
    num_output: 1024
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu16"
  type: "ReLU"
  bottom: "conv16"
  top: "relu16"	
}
layer {
  name: "scale16"
  type: "Power"
  bottom: "conv16"
  top: "scale16"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise16"
  type: "Eltwise"
  bottom: "relu16"
  bottom: "scale16"
  top: "layer16"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "pool16"
  type: "Pooling"
  bottom: "layer16"
  top: "pool16"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}


layer{
  name: "conv17"
  type: "Convolution"
  bottom: "pool16"
  top: "conv17"
  convolution_param {
    num_output: 512
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu17"
  type: "ReLU"
  bottom: "conv17"
  top: "relu17"	
}
layer {
  name: "scale17"
  type: "Power"
  bottom: "conv17"
  top: "scale17"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise17"
  type: "Eltwise"
  bottom: "relu17"
  bottom: "scale17"
  top: "layer17"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "conv18"
  type: "Convolution"
  bottom: "layer17"
  top: "conv18"
  convolution_param {
    num_output: 1024
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu18"
  type: "ReLU"
  bottom: "conv18"
  top: "relu18"	
}
layer {
  name: "scale18"
  type: "Power"
  bottom: "conv18"
  top: "scale18"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise18"
  type: "Eltwise"
  bottom: "relu18"
  bottom: "scale18"
  top: "layer18"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "conv19"
  type: "Convolution"
  bottom: "layer18"
  top: "conv19"
  convolution_param {
    num_output: 512
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu19"
  type: "ReLU"
  bottom: "conv19"
  top: "relu19"	
}
layer {
  name: "scale19"
  type: "Power"
  bottom: "conv19"
  top: "scale19"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise19"
  type: "Eltwise"
  bottom: "relu19"
  bottom: "scale19"
  top: "layer19"
  eltwise_param {
    operation: SUM
  }
}



layer{
  name: "conv20"
  type: "Convolution"
  bottom: "layer19"
  top: "conv20"
  convolution_param {
    num_output: 1024
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu20"
  type: "ReLU"
  bottom: "conv20"
  top: "relu20"	
}
layer {
  name: "scale20"
  type: "Power"
  bottom: "conv20"
  top: "scale20"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise20"
  type: "Eltwise"
  bottom: "relu20"
  bottom: "scale20"
  top: "layer20"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "conv21"
  type: "Convolution"
  bottom: "layer20"
  top: "conv21"
  convolution_param {
    num_output: 1024
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu21"
  type: "ReLU"
  bottom: "conv21"
  top: "relu21"	
}
layer {
  name: "scale21"
  type: "Power"
  bottom: "conv21"
  top: "scale21"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise21"
  type: "Eltwise"
  bottom: "relu21"
  bottom: "scale21"
  top: "layer21"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "conv22"
  type: "Convolution"
  bottom: "layer21"
  top: "conv22"
  convolution_param {
    num_output: 1024
    kernel_size: 3
    pad: 1
    stride: 2
  }
}
layer {
  name: "relu22"
  type: "ReLU"
  bottom: "conv22"
  top: "relu22"	
}
layer {
  name: "scale22"
  type: "Power"
  bottom: "conv22"
  top: "scale22"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise22"
  type: "Eltwise"
  bottom: "relu22"
  bottom: "scale22"
  top: "layer22"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "conv23"
  type: "Convolution"
  bottom: "layer22"
  top: "conv23"
  convolution_param {
    num_output: 1024
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu23"
  type: "ReLU"
  bottom: "conv23"
  top: "relu23"	
}
layer {
  name: "scale23"
  type: "Power"
  bottom: "conv23"
  top: "scale23"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise23"
  type: "Eltwise"
  bottom: "relu23"
  bottom: "scale23"
  top: "layer23"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv24"
  type: "Convolution"
  bottom: "layer23"
  top: "conv24"
  convolution_param {
    num_output: 1024
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu24"
  type: "ReLU"
  bottom: "conv24"
  top: "relu24"	
}
layer {
  name: "scale24"
  type: "Power"
  bottom: "conv24"
  top: "scale24"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise24"
  type: "Eltwise"
  bottom: "relu24"
  bottom: "scale24"
  top: "layer24"
  eltwise_param {
    operation: SUM
  }
}




layer{
  name: "fc25"
  type: "InnerProduct"
  bottom: "layer24"
  top: "fc25"
  inner_product_param {
    num_output: 512
  }
}
layer {
  name: "relu25"
  type: "ReLU"
  bottom: "fc25"
  top: "relu25"	
}
layer {
  name: "scale25"
  type: "Power"
  bottom: "fc25"
  top: "scale25"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise25"
  type: "Eltwise"
  bottom: "relu25"
  bottom: "scale25"
  top: "layer25"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "fc26"
  type: "InnerProduct"
  bottom: "layer25"
  top: "fc26"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu26"
  type: "ReLU"
  bottom: "fc26"
  top: "relu26"	
}
layer {
  name: "scale26"
  type: "Power"
  bottom: "fc26"
  top: "scale26"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise26"
  type: "Eltwise"
  bottom: "relu26"
  bottom: "scale26"
  top: "layer26"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "fc27"
  type: "InnerProduct"
  bottom: "layer26"
  top: "result"
  inner_product_param {
    num_output: 1470
  }
}

TLESORT · February 20, 2017, 12:46pm

Hi AastaLLL,

thank you for your answer!
I experimented the solution you proposed and the results with the 32 bits version of tensorRT are close to the results with caffe. ( for example https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/Images/cat_detection_modified_32bits.jpg ).
However the solution with the 16 bits version gives wrong results :
Cat : https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/Images/cat_detection_modified_16bits.jpg
Matrice of zeros : https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/Images/zeros_detection_modified_16bits.jpg

Thank you very much for your help, do you think there is anything I can do to make the 16 bits version works?

The code I used to change the mode from 32bits to 16 is :

#NB  : "builder->platformHasFastFp16()" return true

INetworkDefinition* network = builder->createNetwork();
ICaffeParser *parser = createCaffeParser();
const IBlobNameToTensor *blobNameToTensor =
			parser->parse(prototxt,		// caffe deploy file
			caffemodel,		// caffe model file
			*network,		// network definition that the parser will populate
			nvinfer1::DataType::kHALF);
[...]

builder->setHalf2Mode(true);

AastaLLL · February 21, 2017, 9:46am

Hi,

Thanks for your feedback.
We are investigating fp16 issue and will update to you later.

AastaLLL · March 17, 2017, 10:34am

Hi,

Thank you for your patient.
We found that YOLO are quite sensitive to the networks precision and need to debug it more.

Sorry for keeping you waiting but we still need some time to figure out the root cause.
Thanks.

ericbin · March 27, 2017, 3:31am

Hi, I found there is a bug in your code, you should change the WrapInputLayer2Bgr to

void WrapInputLayer2Bgr(std::vector<std::vectorcv::Mat >& input_channels,float* buffer) {

float* input_data = buffer;

for (int n = 0; n < BATCH_SIZE; ++n) {
	input_channels.push_back(std::vector<cv::Mat>());
	for (int i = 0; i < 3; ++i) {
		if (i == 0)
			input_data = buffer + 2 * INPUT_H * INPUT_W;
		else if (i == 1)
			input_data = buffer + INPUT_H * INPUT_W;
		else
			input_data = buffer;
		cv::Mat channel(INPUT_H, INPUT_W, CV_32FC1, input_data);
		input_channels[n].push_back(channel);
		//input_data += INPUT_H * INPUT_W;
	}
}

}
and change this line in function ‘Preprocess’ from
if (_rescaleTo01)
sample_float = sample_float / 255.f;

to

if (_rescaleTo01)
	sample_float = sample_float / 127.5 - 1;

then the result will be near with the darknet/yolo.
If you want to the result is exactly match the darknet/yolo, you can add a active function to implement the fix-relu(negative slope: 0.08, positive slope: 1.08 ) used in the tensorrt. and change the weights file to caffemodel.
but I dont know why should change the scope of input image from [0.0, 1.0] to [-1.0, 1.0], could anyone show me the reason?

BTW: I think YOLO are not sensitive to the networks precision, if you use the caffemodel produced by those steps, you will found the results of fp16 mode and fp32 mode are almost exactly match. I have made a ‘pseudo-fp16 darknet’ to generate the fp16 weights file and I found the result is hardly to tell with fp32 weights file.

ericbin · March 27, 2017, 3:32am

Sorry for the mistake in last post, it should be "you should change the function WrapInputLayer to

WrapInputLayer2Bgr"

jack_2017 · July 29, 2017, 1:59pm

Is there any update on this topic?
Where can I find the user guide and examples of tensorRT 2.1?

AastaLLL · July 31, 2017, 2:10am

Hi,

TensorRT2.1 is available with JetPack3.1.

This sample is based on TensorRT1.0. There are some API changes, but not much.

r1ch13r1ch · September 25, 2017, 7:08am

Hi,

Is there any update about fp16 issue?

Thanks.

AastaLLL · October 2, 2017, 6:25am

Hi,

We found YOLO is pretty sensitive to the precision and fp16 mode will slightly lower the output precision.
If you want fp16 acceleration, it’s recommended to train YOLO model directly on the fp16 mode.

Thanks.

DaLT · November 17, 2017, 10:56am

HI，@TLESORT
I was wondering what’s the fps results when using TensorRT FP32 and FP16 on YOLO V2?

bhargavK · January 17, 2018, 1:16am

I have the same question! Has anyone worked on it yet?

AastaLLL · January 22, 2018, 7:14am

Hi, bhargavK

Could you share more information about your question with us?

YOLO can run correctly with TensorRT on float mode.
Have you tested it?

Thanks.

bhargavK · January 22, 2018, 8:00pm

Hi AstaLLL,

Thanks for your reply. I haven’t tested it yet. I was more curious about its performance.

I will test it soon and report the FPS if I get the time, otherwise, I will keep using DetectNet for now.

ZeeshanHayderr · May 21, 2018, 12:10pm

Hi all:
I am trying to run Tiny Yolo version 2 with tensorRT optimization. i am giving input image in BGR format and values in range [0 to 1]. I have approximated leaky Relu with Relu+scale+eltwise operation. I am getting output at second last layer which is a convolution layer and its output size is 12x12x125 tensor. I have implemented the last detection layer in python seperately.
Everything is working fine using caffe but tensorRT is not giving correct output or may be i am interpreting output wrongly.

As i am getting output at second last layer, tensorRT gives me output in NCHW linearized array which is of size=1x125x12x12=18000.

I take this output from TensorRT and reshape it as 125x12x12 and send it to my python implemented detection layer. i am not getting correct results but in caffe implementations i am getting correct results.

Please tell me what I am doing wrong, either giving input in wrong format or getting output in wrong format.

Thanks in advance…

dht7166 · July 6, 2018, 4:02pm

I have been getting the same error with boxes everywhere, even with suggested scale and eltwise. I would also like to ask if the scaling to 0.08 means anything in particular, should it be changed to adapt to different model?

I am using a tensorflow model, with tiny YOLOv2. The implementation is from the repo basic-yolo-keras.

AastaLLL · July 13, 2018, 7:28am

Hi,

The suggestion is posed one year ago and it is not update-to-date.
Could you create a new topic and explain your issue in detail?

Thanks.

supportacqba · July 16, 2018, 12:34pm

I have Jetson Tx1 development kit opened the box but not used. I had bought for a Project but the Project got cancelled. So I was not able to use this Kit. I am now selling this development kit of Rs 48000/-. (negotiable). Please contact me on email id as.k231216@gmail.com. You guys can call me or mail me for the recent pictures of the kit. It will be helpful if anyone needs it.

Topic		Replies	Views
Tiny Yolo ver-2 giving wrong output Jetson TX2	4	761	October 18, 2021
TensorRT 2.1 implement yoloV2 with fp16 mode result error Jetson TX1	8	1292	July 18, 2019
I don't get similar results with TensorRT and the trained tensorflow model! Jetson TX2	20	4737	October 18, 2021
Get wrong result when I using tensorRT to do inference, am I wrong to use ? Jetson TX2	18	3173	October 18, 2021
Inference with TensorRT after training Yolo v4 with TLT 3.0 TAO Toolkit	6	2135	October 12, 2021
Inference Time is not stable TensorRT	10	1892	January 3, 2019
Inference error using YOLOv2 on Jetson TX2 Jetson TX2	4	1159	October 18, 2021
Different output value between TensorRT and Darknet TensorRT	16	4416	October 12, 2021
Wrong inference result with python API for tensorflow model TensorRT	2	475	October 12, 2021
No result when using tensorRT Sample FasterRCNN with other images Jetson TX2	43	6524	October 18, 2021

TensorRT YOLO inference error

Related topics