Description
Hi,
I am trying to train image classification model following NVIDIA TLT tutorial: (Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation). It contains 3 classes: Good, Leaked and Scratched.
I chose the spec file as described in the tutorial (Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation)
Here it is:
model_config {
# Model architecture can be chosen from:
# ['resnet', 'vgg', 'googlenet', 'alexnet', 'mobilenet_v1', 'mobilenet_v2', 'squeezenet', 'darknet', 'googlenet']
arch: "resnet"
# for resnet --> n_layers can be [10, 18, 34, 50, 101]
# for vgg --> n_layers can be [16, 19]
# for darknet --> n_layers can be [19, 53]
n_layers: 18
use_bias: True
use_batch_norm: True
all_projections: True
use_pooling: False
freeze_bn: False
freeze_blocks: 0
freeze_blocks: 1
# image size should be "3, X, Y", where X,Y >= 16
input_image_size: "3,224,224"
}
eval_config {
eval_dataset_path: "/workspace/testing-data"
model_path: "/workspace/results/weights/resnet_008.tlt"
top_k: 3
batch_size: 8
n_workers: 8
}
train_config {
train_dataset_path: "/workspace/training-data"
val_dataset_path: "/workspace/testing-data"
#pretrained_model_path: "/path/to/your/pretrained/model"
# optimizer can be chosen from ['adam', 'sgd']
optimizer: "sgd"
batch_size_per_gpu: 8
n_epochs: 8
n_workers: 16
# regularizer
reg_config {
type: "L2"
scope: "Conv2D,Dense"
weight_decay: 0.00005
}
# learning_rate
lr_config {
# "step" and "soft_anneal" are supported.
scheduler: "soft_anneal"
# "soft_anneal" stands for soft annealing learning rate scheduler.
# the following 4 parameters should be specified if "soft_anneal" is used.
learning_rate: 0.005
soft_start: 0.056
annealing_points: "0.3, 0.6, 0.8"
annealing_divider: 10
# "step" stands for step learning rate scheduler.
# the following 3 parameters should be specified if "step" is used.
# learning_rate: 0.006
# step_size: 10
# gamma: 0.1
# "cosine" stands for soft start cosine learning rate scheduler.
# the following 2 parameters should be specified if "cosine" is used.
# learning_rate: 0.05
# soft_start: 0.01
}
}
I resized my training and testing images to 244x244.
Then I just followed the tutorial to train, evaluate and inferencing (using tlt-infer). Interesting thing that once I got my tlt model and pointed tlt-infer to various test images I got correct labels back. Here is a tail of the output produced by tlt-infer:
avg_pool (AveragePooling2D) (None, 512, 1, 1) 0 block_4b_relu[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (None, 512) 0 avg_pool[0][0]
__________________________________________________________________________________________________
predictions (Dense) (None, 3) 1539 flatten[0][0]
==================================================================================================
Total params: 11,549,827
Trainable params: 11,372,675
Non-trainable params: 177,152
__________________________________________________________________________________________________
2021-01-13 14:13:49,186 [INFO] iva.makenet.scripts.inference: Processing ./testing-data/Good2/Good_2020_06_29__12_17_42_148169.jpg...
2021-01-13 14:13:49.609370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-01-13 14:13:49.888716: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
Current predictions: [[9.9969721e-01 1.0639867e-05 2.9223689e-04]]
Class label = 0
Class name = Good2
Then I exported the model to etlt format to be used with Jetson Nano:
tlt-export classification -m ./results/weights/resnet_008.tlt -k <my pass> --data_type fp16 -o ./resnet_008.etlt
and created an mp4 video using my test images set (Good, Leaked and Scratched) so I can feed it into deepstream-app.
Then I created config files for deepstream-app (again as per instructions: Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation
Here is my version:
[property]
gpu-id=0
net-scale-factor=1
offsets=123.675;116.28;103.53
#model-color-format=1
batch-size=1
tlt-model-key=<my pass>
tlt-encoded-model=../models/resnet_008.etlt
model-engine-file=../models/resnet_008.etlt_b1_gpu0_fp16.engine
labelfile-path=labels.txt
infer-dims=3;224;224
uff-input-blob-name=input_1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
process-mode=1
interval=0
network-type=1
gie-unique-id=1
output-blob-names=predictions/Softmax
classifier-threshold=0.6
One thing to note about labels example in the tutorial. It seems there is an error. Categories in the labels file should be listed one after another separated with ‘;’ - not one on each line as instructed in the tutorial. So my labels file just contains: Good;Leaked;Scratched;
Also I configured deepstream to output overlay result via rtsp. Here is my config file for the app:
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=3
uri=file://../video/testing.mp4
num-sources=8
#drop-frame-interval=2
gpu-id=0
# (0): memtype_device - Memory type Device
# (1): memtype_pinned - Memory type Host Pinned
# (2): memtype_unified - Memory type Unified
cudadec-memtype=0
[sink2]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming
type=4
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
bitrate=4000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0
# set below properties in case of RTSPStreaming
rtsp-port=8554
udp-port=5400
[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0
[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=1
batch-size=8
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1280
height=768
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0
## If set to TRUE, system timestamp will be attached as ntp timestamp
## If set to FALSE, ntp timestamp from rtspsrc, if available, will be attached
# attach-sys-ts-as-ntp=1
# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
nvbuf-memory-type=0
config-file=config_infer.txt
Finally I run the deepstream-app with specified config file, got the engine file built and the app performing inferencing and the resuls overlay could be seen using ffplay via rtsp. Unfortunatelly this is where the model stops working. For some reason I only get ‘Leaked’ label displayed. Always.
I have played with classifier-threshold. Initially I had it at 0.2. If I set it to 0.8 then the label ‘Leaked’ disappeared some times. When I set net-scale-factor to 0.5 I got Leaked and Scratch labels displayed but they weren’t related to what was on the screen at that particular moment.
I have tried increasing number of epoc to 30 during training as well as pruning and retraining but it didn’t make any difference.
So where did I make a mistake in following the tutorial?
Environment
Training machine with GeForce GTX 1650:
Tensorflow: 1.15
Cuda: 10
Training docker container: nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3
Deployment jetson nano:
Jetpack 4.4
TensorRT 7.1.3 + cuda 10.2