DS 7
dGPU.
I retrained detectnet_v2 following this notebook tao_tutorials/notebooks/tao_launcher_starter_kit/detectnet_v2/detectnet_v2.ipynb at main · NVIDIA/tao_tutorials · GitHub, also with the same Kitti dataset as described there.
I literally did not change any of the parameters, just followed the notebook. The training was done using a Gigabyte laptop with GeForce RTX 3060 and did run about 7h. Twice (also for the pruned model).
I copied the resulting files
├── calibration.bin
├── labels.txt
├── nvinfer_config.txt
├── resnet18_detector.onnx
up to my AWS T4 instance and ran the inference there.
I managed to use the resulting files with my DS 7.0 app.
But the results are bad compared to the default NVIDIA resnet18_trafficcamnet
.
Here the configuration for the re-trained model resnet18-detector
:
[property]
gpu-id=0
net-scale-factor=0.00392156862745098
offsets=0;0;0
infer-dims=3;384;1248
tlt-model-key=tlt_encode
network-type=0
network-mode=2
labelfile-path=models/primary-detector/resnet18-detector/labels.txt
onnx-file=models/primary-detector/resnet18-detector/resnet18_detector.onnx
model-engine-file=models/primary-detector/resnet18-detector/resnet18_detector.onnx_b1_gpu0_fp16.engine
int8-calib-file=models/primary-detector/resnet18-detector/calibration.bin
batch-size=1
num-detected-classes=3
model-color-format=0
maintain-aspect-ratio=0
output-tensor-meta=0
cluster-mode=2
gie-unique-id=1
uff-input-order=0
#output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
uff-input-blob-name=input_1
[class-attrs-all]
pre-cluster-threshold=0.2
eps=0.4
group-threshold=1
The configuration for the inherited resnet18-trafficcamnet
model:
################################################################################
# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################
# Following properties are mandatory when engine files are not specified:
# int8-calib-file(Only in INT8)
# Caffemodel mandatory properties: model-file, proto-file, output-blob-names
# UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names
# ONNX: onnx-file
#
# Mandatory properties for detectors:
# num-detected-classes
#
# Optional properties for detectors:
# cluster-mode(Default=Group Rectangles), interval(Primary mode only, Default=0)
# custom-lib-path,
# parse-bbox-func-name
#
# Mandatory properties for classifiers:
# classifier-threshold, is-classifier
#
# Optional properties for classifiers:
# classifier-async-mode(Secondary mode only, Default=false)
#
# Optional properties in secondary mode:
# operate-on-gie-id(Default=0), operate-on-class-ids(Defaults to all classes),
# input-object-min-width, input-object-min-height, input-object-max-width,
# input-object-max-height
#
# Following properties are always recommended:
# batch-size(Default=1)
#
# Other optional properties:
# net-scale-factor(Default=1), network-mode(Default=0 i.e FP32),
# model-color-format(Default=0 i.e. RGB) model-engine-file, labelfile-path,
# mean-file, gie-unique-id(Default=0), offsets, process-mode (Default=1 i.e. primary),
# custom-lib-path, network-mode(Default=0 i.e FP32)
#
# The values in the config file are overridden by values set through GObject
# properties.
# RESNET18-TRAFFICCAMNET model
[property]
gpu-id=0
net-scale-factor=0.00392156862745098
model-color-format=0
maintain-aspect-ratio=1
scaling-filter=0
scaling-compute-hw=0
tlt-model-key=tlt_encode
tlt-encoded-model=models/primary-detector/resnet18-trafficcamnet/resnet18_trafficcamnet.etlt
model-engine-file=models/primary-detector/resnet18-trafficcamnet/resnet18_trafficcamnet.etlt_b1_gpu0_fp16.engine
labelfile-path=models/primary-detector/resnet18-trafficcamnet/labels.txt
int8-calib-file=models/primary-detector/resnet18-trafficcamnet/cal_trt.bin
force-implicit-batch-dim=1
batch-size=1
process-mode=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=4
interval=0
gie-unique-id=1
uff-input-order=0
uff-input-blob-name=input_1
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
cluster-mode=2
infer-dims=3;544;960
#operate-on-class-ids=2
#filter-out-class-ids=0;1;3;
[class-attrs-all]
pre-cluster-threshold=0.2
eps=0.4
group-threshold=1
I’m running inference from an FFMPEG feed, which is consumed by RTSP by the inference solution. Then I’m fetching the annotated video back from the RTSP server. The video is an arbitrary Berlin street scene in a loop, which will be removed again after having resolved this problem.
The results obtained with the retrained model resnet18-detector
:
The results obtained with the original resnet18-trafficcam
model:
Please note, how much better the trafficcam
detections look, especially with the green car coming from the left or one of the cyclists. Also the cars, passing by, are detected way more smoother…