Retrained model shows worse results compared to NVIDIA provided model

foreverneilyoung · May 27, 2024, 10:29am

DS 7
dGPU.

I retrained detectnet_v2 following this notebook tao_tutorials/notebooks/tao_launcher_starter_kit/detectnet_v2/detectnet_v2.ipynb at main · NVIDIA/tao_tutorials · GitHub, also with the same Kitti dataset as described there.

I literally did not change any of the parameters, just followed the notebook. The training was done using a Gigabyte laptop with GeForce RTX 3060 and did run about 7h. Twice (also for the pruned model).

I copied the resulting files

├── calibration.bin
├── labels.txt
├── nvinfer_config.txt
├── resnet18_detector.onnx

up to my AWS T4 instance and ran the inference there.

I managed to use the resulting files with my DS 7.0 app.

But the results are bad compared to the default NVIDIA resnet18_trafficcamnet.

Here the configuration for the re-trained model resnet18-detector:

[property]
gpu-id=0
net-scale-factor=0.00392156862745098
offsets=0;0;0
infer-dims=3;384;1248
tlt-model-key=tlt_encode
network-type=0
network-mode=2
labelfile-path=models/primary-detector/resnet18-detector/labels.txt
onnx-file=models/primary-detector/resnet18-detector/resnet18_detector.onnx
model-engine-file=models/primary-detector/resnet18-detector/resnet18_detector.onnx_b1_gpu0_fp16.engine
int8-calib-file=models/primary-detector/resnet18-detector/calibration.bin
batch-size=1
num-detected-classes=3
model-color-format=0
maintain-aspect-ratio=0
output-tensor-meta=0
cluster-mode=2
gie-unique-id=1
uff-input-order=0
#output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
uff-input-blob-name=input_1


[class-attrs-all]
pre-cluster-threshold=0.2
eps=0.4
group-threshold=1

The configuration for the inherited resnet18-trafficcamnet model:

################################################################################
# SPDX-FileCopyrightText: Copyright (c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################

# Following properties are mandatory when engine files are not specified:
#   int8-calib-file(Only in INT8)
#   Caffemodel mandatory properties: model-file, proto-file, output-blob-names
#   UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names
#   ONNX: onnx-file
#
# Mandatory properties for detectors:
#   num-detected-classes
#
# Optional properties for detectors:
#   cluster-mode(Default=Group Rectangles), interval(Primary mode only, Default=0)
#   custom-lib-path,
#   parse-bbox-func-name
#
# Mandatory properties for classifiers:
#   classifier-threshold, is-classifier
#
# Optional properties for classifiers:
#   classifier-async-mode(Secondary mode only, Default=false)
#
# Optional properties in secondary mode:
#   operate-on-gie-id(Default=0), operate-on-class-ids(Defaults to all classes),
#   input-object-min-width, input-object-min-height, input-object-max-width,
#   input-object-max-height
#
# Following properties are always recommended:
#   batch-size(Default=1)
#
# Other optional properties:
#   net-scale-factor(Default=1), network-mode(Default=0 i.e FP32),
#   model-color-format(Default=0 i.e. RGB) model-engine-file, labelfile-path,
#   mean-file, gie-unique-id(Default=0), offsets, process-mode (Default=1 i.e. primary),
#   custom-lib-path, network-mode(Default=0 i.e FP32)
#
# The values in the config file are overridden by values set through GObject
# properties.


# RESNET18-TRAFFICCAMNET model

[property]
gpu-id=0
net-scale-factor=0.00392156862745098
model-color-format=0
maintain-aspect-ratio=1
scaling-filter=0
scaling-compute-hw=0
tlt-model-key=tlt_encode
tlt-encoded-model=models/primary-detector/resnet18-trafficcamnet/resnet18_trafficcamnet.etlt
model-engine-file=models/primary-detector/resnet18-trafficcamnet/resnet18_trafficcamnet.etlt_b1_gpu0_fp16.engine
labelfile-path=models/primary-detector/resnet18-trafficcamnet/labels.txt
int8-calib-file=models/primary-detector/resnet18-trafficcamnet/cal_trt.bin
force-implicit-batch-dim=1
batch-size=1
process-mode=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=4
interval=0
gie-unique-id=1
uff-input-order=0
uff-input-blob-name=input_1
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
cluster-mode=2
infer-dims=3;544;960
#operate-on-class-ids=2
#filter-out-class-ids=0;1;3;

[class-attrs-all]
pre-cluster-threshold=0.2
eps=0.4
group-threshold=1

I’m running inference from an FFMPEG feed, which is consumed by RTSP by the inference solution. Then I’m fetching the annotated video back from the RTSP server. The video is an arbitrary Berlin street scene in a loop, which will be removed again after having resolved this problem.

The results obtained with the retrained model resnet18-detector:

The results obtained with the original resnet18-trafficcam model:

Please note, how much better the trafficcam detections look, especially with the green car coming from the left or one of the cyclists. Also the cars, passing by, are detected way more smoother…

Morganh · May 27, 2024, 3:13pm

Since you are training with KITTI dataset, as mentioned in the notebook, you can split the KITTI dataset and select part of it to run inference against KITTI dataset.
For Berlin street scene, the KITTI dataset may have different data distribution as it, so it is better to use Berlin street scene dataset to train and run inference.
For trafficcamnet pretrained model mentioned in ngc model card, TrafficCamNet | NVIDIA NGC, this trafficCamNet v1.0 model was trained on a proprietary dataset with more than 3 million objects for car class and other classes. So it looks better for the trafficcam detection.

foreverneilyoung · May 27, 2024, 3:31pm

What do you mean by that?

foreverneilyoung · May 27, 2024, 3:32pm

I mean, KITTI is supposed to support autonomous driving, so it should be good enough for even that street scene…

Morganh · May 28, 2024, 2:38am

Currently, finetuning trafficcamnet (based on detectnet_v2 network) without forgetting is not supported. But from https://developer.nvidia.com/blog/training-custom-pretrained-models-using-tlt/, with a frozen convolutional layer, the weights do not change in the frozen layer during loss update. This is especially helpful in transfer learning, where you can reuse the features provided by the pretrained weights and reduce training time. You can try to frozen some layers. Also, it is better to retrain with Berlin street scene dataset.

foreverneilyoung · May 28, 2024, 3:53pm

Repeated the training directly from the detectnet_v2 notebook on a T4 instance w/o any parameter change. The results are simply … not acceptable. This is higher bullshit, cannot compete, neither with yolo nor trafficcamnet. You should really think about re-working this tutorial that it at least provides trafficcamnet accuracy, otherwise it is just a big disappointment after 24 h of training.

Or at least lower the expectations while saying, what can be expected.

foreverneilyoung · May 28, 2024, 5:15pm

It’s also not working at least half as good as other models with a NY street scene…

Morganh · May 29, 2024, 1:42am

Can you share the training spec file? And what is the resolution of your training images?

foreverneilyoung · May 29, 2024, 5:05am

Can you share the training spec file? And what is the resolution of your training images?

Well, I just ran the notebook. What could I share, what you don’t already have?

Morganh · May 29, 2024, 5:10am

Do you mean you use the default spec file which is the same as tao_tutorials/notebooks/tao_launcher_starter_kit/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt at main · NVIDIA/tao_tutorials · GitHub?

foreverneilyoung · May 29, 2024, 5:14am

Exactly.

Just to tell you where I stand: I never ever dealt with training before. I’m a total newbie here. What else chance do I have as to follow your tutorials? It is by far not self-explanatory, so in order to have at least SOMETHING before going my own way, I was trying to “just” follow your stuff. Wrong?

foreverneilyoung · May 29, 2024, 5:15am

But since you mention that: My final model just has three classes: car (everything is a car now), pedestrian and cyclist. No van, no sitting person.

Morganh · May 29, 2024, 5:23am

If you follow the default notebook(tao_tutorials/notebooks/tao_launcher_starter_kit/detectnet_v2/detectnet_v2.ipynb at main · NVIDIA/tao_tutorials · GitHub) and its spec file(tao_tutorials/notebooks/tao_launcher_starter_kit/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt at main · NVIDIA/tao_tutorials · GitHub), please note that

This tutorial is training against part of public KITTI dataset. And run evaluation against part of KITTI dataset.
The tutorial is using nvidia/tao/pretrained_detectnet_v2:resnet18(TAO Pretrained DetectNet V2 | NVIDIA NGC) pretrained model instead of trafficcamnet pretrained model.
The training spec file(tao_tutorials/notebooks/tao_launcher_starter_kit/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt at main · NVIDIA/tao_tutorials · GitHub) is adaptive with KITTI dataset. For example, set output_image_width: 1248 and output_image_height: 384,etc.

So, if you are using the default notebook and spec file, it is not using trafficcamnet pretrained model.

foreverneilyoung · May 29, 2024, 5:28am

I’m sure I know this meanwhile. There was also nothing which told I would be doing trafficcamnet training. So I didn’t expect that from the beginning.

But since my app uses currently two different models - yolov7-tiny and trafficcamnet - and both perfom WAYYYY BETTER I was having the obvioulsy wrong expectation, that a costly training, which is praised and advertised as THE way to do things, would at least show up something, which could be used.

It cannot and it was wasted effort to try so, this is what I know now. It would have been better to have known that before.

foreverneilyoung · May 29, 2024, 5:29am

Thanks anyway for your efforts to help me with both problems. But your TAO is a PITA…

Morganh · May 29, 2024, 5:36am

The tutorial notebook shows to end users how to train a detectnet_v2 network with public KITTI dataset and a pretrained model in ngc.
And also the trafficcamnet is actually trained with detectnet_v2 network. But it is trained on a proprietary dataset with more than 3 million objects.

foreverneilyoung · May 29, 2024, 5:37am

Yepp, you mentioned that. I did expect too much, obvioulsy

Topic		Replies	Views
Retraining Trafficcamnet with custom vehicle dataset TAO Toolkit	30	2437	March 11, 2022
Retrain TrafficCamNet using TLT TAO Toolkit	10	800	November 9, 2021
TrafficCamNet inference error TAO Toolkit tao	20	914	February 22, 2022
Mix propriertary and public dataset for retrain TAO Toolkit	34	1153	March 10, 2022
Troubles Replicating TLT Model Training Experiment with TAO TAO Toolkit	6	518	November 21, 2023
"tlt-train detectnet_v2" lead core dump TAO Toolkit	7	967	October 12, 2021
TLT3.0 retrain TrafficCamNet "car" class - weak inference results TAO Toolkit	14	873	October 12, 2021
TAO API 5.3 : How to create experiments that leverage pretrained base_experiments from NGC? TAO Toolkit yolo , api , tao	16	32	November 6, 2024
Pretrained Models for detectnet - Vehicles Jetson TX2	19	6150	October 18, 2021
Tlt-infer detectnet_v2 fails - TypeError TAO Toolkit	37	1402	October 12, 2021

Retrained model shows worse results compared to NVIDIA provided model

Related topics