Issue with image classification tutorial and testing with deepstream-app

Description

Hi,

I am trying to train image classification model following NVIDIA TLT tutorial: (Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation). It contains 3 classes: Good, Leaked and Scratched.

I chose the spec file as described in the tutorial (Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation)

Here it is:

model_config {

  # Model architecture can be chosen from:
  # ['resnet', 'vgg', 'googlenet', 'alexnet', 'mobilenet_v1', 'mobilenet_v2', 'squeezenet', 'darknet', 'googlenet']

  arch: "resnet"
  
  # for resnet --> n_layers can be [10, 18, 34, 50, 101]
  # for vgg --> n_layers can be [16, 19]
  # for darknet --> n_layers can be [19, 53]


  n_layers: 18
  use_bias: True
  use_batch_norm: True
  all_projections: True
  use_pooling: False
  freeze_bn: False
  freeze_blocks: 0
  freeze_blocks: 1

  # image size should be "3, X, Y", where X,Y >= 16
  input_image_size: "3,224,224"
}

eval_config {
  eval_dataset_path: "/workspace/testing-data"
  model_path: "/workspace/results/weights/resnet_008.tlt"
  top_k: 3
  batch_size: 8
  n_workers: 8
}

train_config {
  train_dataset_path: "/workspace/training-data"
  val_dataset_path: "/workspace/testing-data"
  #pretrained_model_path: "/path/to/your/pretrained/model"
  # optimizer can be chosen from ['adam', 'sgd']

  optimizer: "sgd"
  batch_size_per_gpu: 8
  n_epochs: 8
  n_workers: 16

  # regularizer
  reg_config {
    type: "L2"
    scope: "Conv2D,Dense"
    weight_decay: 0.00005
  }

  # learning_rate

  lr_config {

    # "step" and "soft_anneal" are supported.

    scheduler: "soft_anneal"

    # "soft_anneal" stands for soft annealing learning rate scheduler.
    # the following 4 parameters should be specified if "soft_anneal" is used.
    learning_rate: 0.005
    soft_start: 0.056
    annealing_points: "0.3, 0.6, 0.8"
    annealing_divider: 10
    # "step" stands for step learning rate scheduler.
    # the following 3 parameters should be specified if "step" is used.
    # learning_rate: 0.006
    # step_size: 10
    # gamma: 0.1

    # "cosine" stands for soft start cosine learning rate scheduler.
    # the following 2 parameters should be specified if "cosine" is used.
    # learning_rate: 0.05
    # soft_start: 0.01
  }
}

I resized my training and testing images to 244x244.

Then I just followed the tutorial to train, evaluate and inferencing (using tlt-infer). Interesting thing that once I got my tlt model and pointed tlt-infer to various test images I got correct labels back. Here is a tail of the output produced by tlt-infer:

avg_pool (AveragePooling2D)     (None, 512, 1, 1)    0           block_4b_relu[0][0]              
__________________________________________________________________________________________________
flatten (Flatten)               (None, 512)          0           avg_pool[0][0]                   
__________________________________________________________________________________________________
predictions (Dense)             (None, 3)            1539        flatten[0][0]                    
==================================================================================================
Total params: 11,549,827
Trainable params: 11,372,675
Non-trainable params: 177,152
__________________________________________________________________________________________________
2021-01-13 14:13:49,186 [INFO] iva.makenet.scripts.inference: Processing ./testing-data/Good2/Good_2020_06_29__12_17_42_148169.jpg...
2021-01-13 14:13:49.609370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-01-13 14:13:49.888716: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
Current predictions: [[9.9969721e-01 1.0639867e-05 2.9223689e-04]]
Class label = 0
Class name = Good2

Then I exported the model to etlt format to be used with Jetson Nano:

tlt-export classification -m ./results/weights/resnet_008.tlt -k <my pass> --data_type fp16 -o ./resnet_008.etlt

and created an mp4 video using my test images set (Good, Leaked and Scratched) so I can feed it into deepstream-app.

Then I created config files for deepstream-app (again as per instructions: Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation

Here is my version:

[property]
gpu-id=0
net-scale-factor=1
offsets=123.675;116.28;103.53
#model-color-format=1
batch-size=1

tlt-model-key=<my pass>
tlt-encoded-model=../models/resnet_008.etlt
model-engine-file=../models/resnet_008.etlt_b1_gpu0_fp16.engine

labelfile-path=labels.txt

infer-dims=3;224;224
uff-input-blob-name=input_1

## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2

process-mode=1
interval=0
network-type=1
gie-unique-id=1
output-blob-names=predictions/Softmax
classifier-threshold=0.6

One thing to note about labels example in the tutorial. It seems there is an error. Categories in the labels file should be listed one after another separated with ‘;’ - not one on each line as instructed in the tutorial. So my labels file just contains: Good;Leaked;Scratched;

Also I configured deepstream to output overlay result via rtsp. Here is my config file for the app:

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=3
uri=file://../video/testing.mp4
num-sources=8
#drop-frame-interval=2
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

[sink2]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming
type=4
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
bitrate=4000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0
# set below properties in case of RTSPStreaming
rtsp-port=8554
udp-port=5400

[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=1
batch-size=8
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1280
height=768
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0
## If set to TRUE, system timestamp will be attached as ntp timestamp
## If set to FALSE, ntp timestamp from rtspsrc, if available, will be attached
# attach-sys-ts-as-ntp=1

# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
nvbuf-memory-type=0
config-file=config_infer.txt

Finally I run the deepstream-app with specified config file, got the engine file built and the app performing inferencing and the resuls overlay could be seen using ffplay via rtsp. Unfortunatelly this is where the model stops working. For some reason I only get ‘Leaked’ label displayed. Always.

I have played with classifier-threshold. Initially I had it at 0.2. If I set it to 0.8 then the label ‘Leaked’ disappeared some times. When I set net-scale-factor to 0.5 I got Leaked and Scratch labels displayed but they weren’t related to what was on the screen at that particular moment.

I have tried increasing number of epoc to 30 during training as well as pruning and retraining but it didn’t make any difference.

So where did I make a mistake in following the tutorial?

Environment

Training machine with GeForce GTX 1650:
Tensorflow: 1.15
Cuda: 10
Training docker container: nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3

Deployment jetson nano:
Jetpack 4.4
TensorRT 7.1.3 + cuda 10.2

Hi @dzmitry.babrovich
First, could you please run inference(via tlt-infer) against a set of test images?

Hi @Morganh,
As I mentioned above I did run tlt-infer against a set of test images - I added the output that it produced on an image from the Good set. I’ll try again and post more results from different categories of the test set.

Hi @Morganh

So I have selected a few image examples from testing set of different categories, here are the results:

Command:
tlt-infer classification -m ./results/weights/resnet_008.tlt -k -cm ./results/classmap.json -i ./testing-data/Good2/Good_2020_06_29__12_17_42_148169.jpg
Result:
Current predictions: [[9.9862862e-01 6.6816618e-05 1.3046165e-03]]
Class label = 0
Class name = Good2

Command
tlt-infer classification -m ./results/weights/resnet_008.tlt -k -cm ./results/classmap.json -i ./testing-data/Scratch2/Scratch_2020_07_06__11_32_26_253208.jpg
Result:
Current predictions: [[1.0612786e-06 5.9442841e-06 9.9999297e-01]]
Class label = 2
Class name = Scratch2

Command
tlt-infer classification -m ./results/weights/resnet_008.tlt -k -cm ./results/classmap.json -i ./testing-data/Leakage2/Leaker_2020_07_02__17_28_58_504982.jpg
Result:
Current predictions: [[1.5853602e-05 9.9969363e-01 2.9050530e-04]]
Class label = 1
Class name = Leakage2

So the trained tlt model does seem to work. I understand that selection is not representative but it gives me some idea that the trained model can classify images from different categories.

Thanks for the info. So, your tlt-infer can work well.
BTW, we can run inference in directory mode to run on a set of test images.
According to the jupyter notebook, the sample command is as below.

tlt-infer classification -m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_$EPOCH.tlt
-k $KEY -b 32 -d $DATA_DOWNLOAD_DIR/split/test/person
-cm $USER_EXPERIMENT_DIR/output_retrain/classmap.json

The inference result will be saved in test/person/result.csv

Hi @Morganh

THanks for the hint. I’ve run inference commands against my testing images as per your advice and the results look good to me. Please see the attached file. I issued tlt-infer command agains each category directory (hence 3 files)result_original_Good.csv (87.5 KB) result_original_Leakage.csv (64.1 KB) result_original_Scratch.csv (84.4 KB)

Please refer to

How about running mp4 file instead of rtsp?
Could you provide more details about " Unfortunately this is where the model stops working". Any log?

Firstly, please run below example and make sure it can run.

Try to config with your rtsp file instead of mp4 file. And make sure it can run.

If above works, please check again your deepstream config along with the inference-config file.
In your inference config file, please modify
infer-dims=3;224;224 to infer-dims=3;224;224;0
net-scale-factor=1 to net-scale-factor=1.0

More, actually there are two ways running a classification model with deepstream.

  1. Run as primary gie
  2. Run as secondary gie

If set as primary gie,
please set process-mode=1 in the inference config file.

If set as secondary gie,
please set process-mode=2 in the inference config file.

Hi @Morganh,

Changing infer-dims=3;224;224 to infer-dims=3;224;224;0 produces the error:

Error. 'infer-dims' array length is 4. Should be 3 as [c;h;w] order.
Failed to parse group property

Did you mean input-dims=3;224;224;0

I did that and it didn’t make any difference.

In the mean time I have been trying to follow a different path: train TensoFlow model and convert it to onnx (Transfer learning and fine-tuning  |  TensorFlow Core). I managed to do it using Resnet50V2 as a base (Resnet50 didn’t work since it had to be converted to onnx with opset 10 which jetson nano TensorRT doesn’t support) and it all worked. I used this config file:

[property]
gpu-id=0
batch-size=1

labelfile-path=labels.txt
onnx-file=../models/fine_model_resnet50v2.onnx
model-engine-file=../models/fine_model_resnet50v2.onnx_b1_gpu0_fp16.engine

## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2

process-mode=1
interval=0
network-type=1
gie-unique-id=1
classifier-threshold=0.8

I am going to give NVIDIA’s tutorial another go and train Resnet model with 50 layers and see if it makes any difference. It does take alot of time though to train on about 7000 images.

I think you can close this ticket since I found a different solution to my problem but I do think that the tutorial needs polishing.

Yes, it is input-dims=3;224;224;0

Glad to see that you have the solution now.

Actually, when you deploy the tlt model in the config file. It should work.
For example, if you trained a two-classes(person and another class) model with TLT classification network, then you can run inference with below two ways in deepstream.

  1. Work as primary trt engine
    ds_classification_as_primary_gie (3.4 KB)
    config_as_primary_gie.txt (741 Bytes)

nvidia@nvidia:/opt/nvidia/deepstream/deepstream-5.0/samples/configs/deepstream-app$ deepstream-app -c ds_classification_as_primary_gie

  1. Work as secondary trt engine
    ds_classification_as_secondary_gie (3.6 KB)
    config_as_secondary_gie.txt (741 Bytes)

nvidia@nvidia:/opt/nvidia/deepstream/deepstream-5.0/samples/configs/deepstream-app$ deepstream-app -c ds_classification_as_secondary_gie

Yes - as I said I configured three class model as primary trt engine and it didn’t work for me. I only have one label displayed irrespective of the image as I have described above.

Please refer to above way I mentioned.
Note that below two lines are not changed.

net-scale-factor=1.0
offsets=123.67;116.28;103.53

And please add below line your your infer spec.

num-detected-classes=3

Hi @Morganh ,
I could integrate and run my classification model with deepstream… But the classified outputs are wrong with deepstream. With tlt-infer and standalone python program, classification is correct… Do i need to change some parameters in config file to get the results correct?

Do i need to do some RGB-BGR conversion in the configuration file ? @Morganh

The model-color-format should be “1” for BGR configuration.

model-color-format = 1

You can try to change below , to check if it helps.

offsets=123.67;116.28;103.53

to

offsets = 103.939;116.779;123.68

Thanks @Morganh ,
Yea my model-color-format is set as 1. And i have replaced the previous offset with 103.939;116.779;123.68…but still all the frames which are supposed to be belonging to positive class is predicted as negative class…

Thanks for the info. I will check further.

1 Like

@jazeel.jk
As mentioned above, please modify to

offsets=103.939;116.779;123.68
model-color-format=1

I confirm that it can get the same result as tlt-infer.
Work as primary trt engine
ds_classification_as_primary_gie (3.4 KB)
config_as_primary_gie (3).txt (743 Bytes)

More, please double check your label file.
Yours should be

negative;positive

@Morganh ,
I checked it, i used the same ds_classification_as_primary_gie as config file for deepstream-app, and label file is in the order of classmap.json… means first one negative and second one positive…
Is there a way to get the predicted outputs printed on the terminal ?

With the step I mentioned above, the predicted output will show at the top left corner of the monitor.
For other ways, please search or ask in deepstream forum.