Classification_tf2 using deepstream python app

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
Desktop
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
Classification
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)

results_dir: '/workspace/tao-experiments/classification_tf2/output/color_dataset_split'
dataset:
  train_dataset_path: "/workspace/tao-experiments/data/color_dataset_split/split/train"
  val_dataset_path: "/workspace/tao-experiments/data/color_dataset_split/split/val"
  preprocess_mode: 'torch'
  num_classes: 9
  augmentation:
    enable_color_augmentation: True
    enable_center_crop: True
train:
  qat: False
  checkpoint: ''
  batch_size_per_gpu: 64
  num_epochs: 120
  optim_config:
    optimizer: 'sgd'
  lr_config:
    scheduler: 'cosine'
    learning_rate: 0.05
    soft_start: 0.05
  reg_config:
    type: 'L2'
    scope: ['conv2d', 'dense']
    weight_decay: 0.00005

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
So, I faced an issue in the deployment with python. After the training I have export it into onnx format to fit into python code. GitHub - NVIDIA-AI-IOT/deepstream_python_apps: DeepStream SDK Python bindings and sample applications This is the example I took to run my classification model.

[property]
net-scale-factor = 1

onnx-file=../../models/Secondary_ColorDetection/vehicle_cctv_dataset.onnx
model-engine-file=../../models/Secondary_ColorDetection/vehicle_cctv_dataset.onnx_b64_gpu0_fp32.engine
labelfile-path=../../models/Secondary_ColorDetection/color_label.txt

# 0=FP32 and 1=INT8 mode
batch-size=64
network-mode=0

# 1=Primary 2=Secondary
process-mode=2
gie-unique-id=7
model-color-format=0
operate-on-gie-id=1
#if need detect all the object remove it
operate-on-class-ids=0
network-type=1
num-detected-classes = 9
infer-dims=3;256;256
classifier-threshold = 0.8
is-classifier=1
output-blob-names=predictions/Softmax

This is my config file for color classification.

The color detection in python app is not accurate. I used the same images in the videos and inference it using

tao model classification_tf2 inference

It shows the result I want, which it has a different result compare to the python code.

Currently, “center_crop” is not supported in deepstream(similar topic: How to set true center crop for classification model in deepstream pipeline?

Please train with “enable_center_crop: false”. Similar topic can be found in Fine-tuned TAO ClassificationTF2 Accuracy Drop after Compiling to TensorRT - #36 by junhui1.

Should I disable “enable_color_augmentation”?

No. You can keep previous setting.

I have turned off the enable_center_crop. It doesn’t have any result shown on the screen.

As mentioned above, please retrain with “enable_center_crop: false”. Similar topic can be found in Fine-tuned TAO ClassificationTF2 Accuracy Drop after Compiling to TensorRT - #36 by junhui1.

Hi I have tried different way to fine tune it but I still do not get the result I wanted. Do you have any recommendation for color detection of vehicle?

How about the result when you run tao model classification_tf2 inference? Is the result expected?

enable_center_crop: False
enable_random_crop: False

After this configuration set to False inference result is also better than deepstream but model that set as True has a better result

How about the evaluation result(tao model classification_tf2 evaluation) if train with True?

And how about the evaluation result(tao model classification_tf2 evaluation) if train with False ?

enable_center_crop: False

24/24 [==============================] - 6s 52ms/step - loss: 2.4216 - topk_acc: 0.9649Evaluation Loss: 2.421600818634033
Evaluation Top 3 accuracy: 0.9649122953414917
Found 741 images belonging to 9 classes.
Calculating per-class P/R and confusion matrix. It may take a while...
Confusion Matrix
[  0   1   0   1   0   0   0   0  10]]Classification Report
              precision    recall  f1-score   support

       black       0.86      0.57      0.69        42
        blue       0.80      0.71      0.75        45
       brown       0.22      0.36      0.27        14
       green       0.15      0.83      0.25        12
      orange       0.77      0.77      0.77        13
      purple       0.75      0.69      0.72        13
         red       1.00      0.82      0.90        44
       white       0.98      0.90      0.94       546
      yellow       0.62      0.83      0.71        12

    accuracy                           0.85       741
   macro avg       0.68      0.72      0.67       741
weighted avg       0.92      0.85      0.88       741

enable_center_crop: True

24/24 [==============================] - 6s 60ms/step - loss: 2.5639 - topk_acc: 0.9595Evaluation Loss: 2.563930034637451
Evaluation Top 3 accuracy: 0.9595141410827637
Found 741 images belonging to 9 classes.
Calculating per-class P/R and confusion matrix. It may take a while...
Confusion Matrix
[  0   0   0   1   1   0   0   0  10]]Classification Report
              precision    recall  f1-score   support

       black       0.73      0.86      0.79        42
        blue       0.91      0.64      0.75        45
       brown       0.47      0.57      0.52        14
       green       0.14      0.83      0.23        12
      orange       0.34      0.92      0.50        13
      purple       0.42      0.62      0.50        13
         red       0.75      0.61      0.67        44
       white       0.98      0.84      0.91       546
      yellow       0.77      0.83      0.80        12

    accuracy                           0.81       741
   macro avg       0.61      0.75      0.63       741
weighted avg       0.90      0.81      0.84       741

I accidentally deleted the previous model I have trained but this is the latest model I have trained with two different configuration. I found out it doesn’t show much different. Do you have any clues?

Seems that there are not much difference.
To improve accuracy, I suggest you to use classification_pytorch network. There are backbones such fan and gcvit,etc.
Notebook: tao_tutorials/notebooks/tao_launcher_starter_kit/classification_pyt/classification.ipynb at main · NVIDIA/tao_tutorials · GitHub
Doc: Image Classification PyT - NVIDIA Docs

Do you have any idea what is the error about?
This error occur when i started to train in classification_pyt

E0912 02:50:52.693000 139875444733760 torch/distributed/elastic/multiprocessing/api.py:881] failed (exitcode: 2) local_rank: 0 (pid: 541) of binary: /usr/bin/python
Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 879, in main
    run(args)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 870, in run
    elastic_launch(
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/classification/scripts/train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:  time      : 2024-09-12_02:50:52
  host      : d1baa8789cb5
  rank      : 0 (local_rank: 0)
  exitcode  : 2 (pid: 541)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

For classification_pyt question, please create a new forum topic and upload the command, full log, etc. Thanks.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.