Issue when a simple classification model deployed with Deepstream 5.0 Jetpack 4.4

• Network Type (Classification)
• TLT Version (2.0 )
• Hardware [Jetson: TX-2]
• Deepstream (5.0)
• Jetpack (4.4)

Hello,
We are learning TLT and trying to prototype a simple image classification model with Deepstream. JPG images (width=28 x height=62 x channel=3) are classified into: green, yellow and red.

After collecting some training images and saving in dataset/test/green, dataset/test/yellow, … dataset/val/green, .etc …, we started to use TLT 2.0 to train.
tlt-train, tlt-evaluate, tlt-infer and tlt-export all work well and we got test.etlt to deploy.

Then we use the following gst-launch script to test:

gst-launch-1.0 filesrc location="data/input.jpg" ! jpegparse ! nvv4l2decoder ! queue ! nvstreammux0.sink_0 nvstreammux name=nvstreammux0 width=28 height=62 batch-size=1 batched-push-timeout=40000 live-source=TRUE ! queue ! nvvideoconvert ! queue ! nvinfer config-file-path="test.txt" ! queue ! nvdsosd process-mode=HW_MODE ! queue ! nvoverlaysink sync=false

and we got this error below. As it suggested, we got stuck on the number 1536 and 3072. Wondering this issue is on the TLT side or deepstream. So any tip or guidance is really appreciated. Thanks a lot.

Setting pipeline to PAUSED ...
Opening in BLOCKING MODE 
0:00:00.198450599 31688   0x555a3ce130 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1715> [UID = 1]: Trying to create engine from model files
ERROR: [TRT]: predictions/MatMul: kernel weights has count 1536 but 3072 was expected
ERROR: [TRT]: predictions/MatMul: kernel weights has count 1536 but 3072 was expected
ERROR: [TRT]: predictions/MatMul: kernel weights has count 1536 but 3072 was expected
ERROR: [TRT]: UffParser: Parser error: predictions/BiasAdd: The input to the Scale Layer is required to have a minimum of 3 dimensions.
parseModel: Failed to parse UFF model

The model is resnet-10 and the tail shows

block_4a_relu (Activation)      (None, 512, 2, 4)    0           add_4[0][0]                      
__________________________________________________________________________________________________
avg_pool (AveragePooling2D)     (None, 512, 1, 1)    0           block_4a_relu[0][0]              
__________________________________________________________________________________________________
flatten (Flatten)               (None, 512)          0           avg_pool[0][0]                   
__________________________________________________________________________________________________
predictions (Dense)             (None, 3)            1539        flatten[0][0]                    
==================================================================================================
Total params: 4,917,443
Trainable params: 4,824,323
Non-trainable params: 93,120

The training specs file:

model_config {

  # Model architecture can be chosen from:
  # ['resnet', 'vgg', 'googlenet', 'alexnet', 'mobilenet_v1', 'mobilenet_v2', 'squeezenet', 'darknet', 'googlenet']

  arch: "resnet"

  # for resnet --> n_layers can be [10, 18, 34, 50, 101]
  # for vgg --> n_layers can be [16, 19]
  # for darknet --> n_layers can be [19, 53]

  n_layers: 10
  use_bias: False
  use_batch_norm: True
  all_projections: True
  use_pooling: False
  freeze_bn: False
  freeze_blocks: 0
  freeze_blocks: 1

  # image size should be "3, X, Y", where X,Y >= 16
  input_image_size: "3,28,62"
}


train_config {
  train_dataset_path: "/workspace/dataset/train"
  val_dataset_path: "/workspace/dataset/val"
  # optimizer can be chosen from ['adam', 'sgd']

  optimizer: "sgd"
  batch_size_per_gpu: 32
  n_epochs: 10
  n_workers: 2

  # regularizer
  reg_config {
    type: "L2"
    scope: "Conv2D,Dense"
    weight_decay: 0.00005

  }

  # learning_rate

  lr_config {

    # "step" and "soft_anneal" are supported.

    scheduler: "soft_anneal"

    # "soft_anneal" stands for soft annealing learning rate scheduler.
    # the following 4 parameters should be specified if "soft_anneal" is used.
    learning_rate: 0.005
    soft_start: 0.056
    annealing_points: "0.3, 0.6, 0.8"
    annealing_divider: 10
    # "step" stands for step learning rate scheduler.
    # the following 3 parameters should be specified if "step" is used.
    # learning_rate: 0.006
    # step_size: 10
    # gamma: 0.1

    # "cosine" stands for soft start cosine learning rate scheduler.
    # the following 2 parameters should be specified if "cosine" is used.
    # learning_rate: 0.05
    # soft_start: 0.01
  }

nvinfer config file test.txt

[property]
gpu-id=0
net-scale-factor=1.0
offsets=103.939;116.779;123.68
model-color-format=0
labelfile-path=test_labels.txt
tlt-encoded-model=test.etlt
tlt-model-key=[my_key]
infer-dims=3;62;28
uff-input-blob-name=input_1
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
interval=0
gie-unique-id=1
#0=Detection 1=Classification 2=Segmentation
network-type=1
scaling-filter=1
scaling-compute-hw=1
output-blob-names=predictions/Softmax
classifier-threshold=0.5

Did you use pretrained model during training?

Please deploy with deepstream-app.
See Image Classification — TAO Toolkit 3.0 documentation

More reference: Issue with image classification tutorial and testing with deepstream-app - #12 by Morganh

No. Actually we intentionally comment out the pretrain-model-file in the train spec file.

Sure. We will give it a try on the deepstream-app.

Other than the gst-launch script, we also tried the classifier example in the IOT/ deepstream_tao_apps:
https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/tree/master/apps/tao_classifier

With some modifications, we did get the same error info.

To narrow down, did you ever use the tlt-converter tool(inside the docker) to generate tensorrt engine successfully?

Will git it a try. I am glad you bring it up here.
tlt-converter is a bit confusing. We thought it’s supposed to run on TX-2 since it’s used to be optimized for that specific hardware platform. Thanks.

For tlt-converter, by default , it is available in TLT 2.0 docker.
If you want to generate trt engine in TX2, please download tlt-converter(Jetson version) in TX2. See TLT 2.0 user guide, Deploying to Deepstream — Transfer Learning Toolkit 2.0 documentation

For the Jetson platform, the tlt-converter is available to download in the dev zone.

got it. thanks a lot.

By using the tlt-converter on both PC and tx-2 back and forth, we were able to identify the trivial bug in the nvinfer config:

Thanks a lot