TensorRT 4 - Problem with data layer when running Caffe from Digits with TRT

Hello,

I want to inference Caffe model trained by DIGITS on Jetson via TRT 4 with C++ api. So far I made TensorFlow models trained by DIGITS work but not Caffe.

The problem is in the parser method (ICaffeParser->parse(…)).

The C++ CaffeParser can not parse model made by DIGITS successfully. I am trying to do simple MNIST LeNet network for grayscale images… So I trained Caffe model in DIGITS (I left the default description of the network) and ran inference via DIGITS. It is working. Now I downloaded the snapshot and trying to parse it in C++ but the Data type layer which is DIGITS using is not supported by TRT.

auto blobNameToTensor = mCaffeParser->parse(aFilePrototxt.toUtf8(), aFileCaffemodel.toUtf8(), *trtNetwork, aNetworkDataType);

In logs:

could not parse layer type Data

The default site definition from newest Digits:

# LeNet
name: "LeNet"
layer {
  name: "train-data"
  type: "Data"
  top: "data"
  top: "label"
  data_param {
    batch_size: 64
  }
  include { stage: "train" }
}
layer {
  name: "val-data"
  type: "Data"
  top: "data"
  top: "label"
  data_param {
    batch_size: 32
  }
  include { stage: "val" }
}
layer {
  # Use Power layer for input scaling
  name: "scale"
  bottom: "data"
  top: "scaled"
  type: "Power"
  power_param {
    # 1/(standard deviation on MNIST dataset)
    scale: 0.0125
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "scaled"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    # Since num_output is unset, DIGITS will automatically set it to the
    #   number of classes in your dataset.
    # Uncomment this line to set it explicitly:
    #num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include { stage: "val" }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
  exclude { stage: "deploy" }
}
layer {
  name: "softmax"
  type: "Softmax"
  bottom: "ip2"
  top: "softmax"
  include { stage: "deploy" }
}

Is there a way how to redefine the input layers of type Data to be supported by TRT?

Thank you.

edit: Does not work with TRT5 too.

Hello,

Layer type “data” is not supported by the TRT parser. You’ll need to implement a custome layer plugin. Please reference: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#add_custom_layer

I changed the network to this:

name: "LeNet"
layer {
  name: "data"
  type: "Input"
  top: "data"
  top: "label"
  input_param { shape: { dim: 1 dim: 1 dim: 28 dim: 28 } }
}
layer {
  # Use Power layer for input scaling
  name: "scale"
  bottom: "data"
  top: "scaled"
  type: "Power"
  power_param {
    # 1/(standard deviation on MNIST dataset)
    scale: 0.0125
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "scaled"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    # Since num_output is unset, DIGITS will automatically set it to the
    #   number of classes in your dataset.
    # Uncomment this line to set it explicitly:
    #num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include { stage: "val" }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
  exclude { stage: "deploy" }
}
layer {
  name: "softmax"
  type: "Softmax"
  bottom: "ip2"
  top: "softmax"
  include { stage: "deploy" }
}

I will implement the custom layer later.

But now I have another problem which I do not understand much.

auto trtNetwork = trtBuilder->createNetwork();

auto blobNameToTensor = mCaffeParser->parse(aFilePrototxt.toUtf8(), aFileCaffemodel.toUtf8(), *trtNetwork, aNetworkDataType);
--------------------------------------------------------------------------------------------------------

Error:
INFO: Using kFLOAT DataType
[libprotobuf FATAL /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/externals/protobuf/x86_64/10.0/include/google/protobuf/repeated_field.h:1408] CHECK failed: (index) < (current_size_): 
FAIL!  : testTrtTfEngine::testCaffeLoadModel() Caught unhandled exception
   Loc: [qtestcase.cpp(1901)]
Totals: 1 passed, 1 failed, 0 skipped, 0 blacklisted, 202ms
********* Finished testing of testTrtTfEngine *********
terminate called after throwing an instance of 'google_private::protobuf::FatalException'
  what():  CHECK failed: (index) < (current_size_):

Downloaded model from Nvidia Digits:
https://drive.google.com/open?id=1NlIqVl6tVgwPMaUcwvjMVC82bkJtxrTs

My bad, was loading bad prototxt. Digits is creating train_val.prototxt solver.prototxt original.prototxt deploy.prototxt

I was using original.protobuf… So I changed it to use deploy.prototxt and now the parsing is working.