Tutorial for train a object detection model for 4 classes objects

hi,
I’m planning to use deepstream6 in jetson nano 2G to detect several classes of objects from camera:
bicycle
motorcycle
people

I’m quite new in this, and noticed there are many pretrained models in ngc, as i looked, seems none of them meet my requirement, some existing vehicle detection models are most build for cars.

the only dataset i know is pascal VOC and ms COCO, does this mean i can download them and directly training by TAO? or does TAO support these dataset?

could you provide suggestions?

  1. The pascal VOC and MS COCO are public datasets. Users can run training with them in TAO.
  2. What is your requirement for the detection model? Which 4 classes?

thanks Morganh.
the target classes are listed in post, as:
Bicycle, Motocycle, People. (late will add a new private custom door sign, but can leave it for now).

so the questions come to me:

  1. Any existed models in NGC can santisfy? That means i can directly use in deepstream.
  2. if I have to train on public dataset, any tutorial to follow based on TAO toolkit to train on well-know public dataset? As I know at least those database are in different labeling format that TAO does not support.
  3. Can I train by steps from From github user nv-Dusty: training-the-ssd-mobilenet-model which finally convert to a model with .onnx format, what is the pros and cons from train directly by TAO? at least i see an issue

thank you.

  1. The dashcamnet (DashCamNet — TAO Toolkit 3.22.02 documentation) or trafficcamnet (TrafficCamNet — TAO Toolkit 3.22.02 documentation) can cover people and bicycle or two-wheelers. You can deploy it to check if it meet your requirement.
    TrafficCamNet | NVIDIA NGC
    DashCamNet | NVIDIA NGC

  2. TAO provides jupyter notebook to tell end user how to train a public KITTI dataset. Please download it in TAO Toolkit Quick Start Guide — TAO Toolkit 3.22.05 documentation .

For VOC dataset, please refer to blog https://developer.nvidia.com/blog/preparing-state-of-the-art-models-for-classification-and-object-detection-with-tao-toolkit/ . In " Prepare the PASCAL VOC dataset" section, it tells us how to convert the XML labels to KITTI format labels. The xml_to_kitti.py Python script handles this conversion.

For coco dataset, see " Prepare the COCO 2014 and COCO 2017 datasets" section, To convert the JSON labels to the KITTI format, use the coco2kitti.py Python script.

  1. No, TAO can only deploy .etlt files or tensorrt engine files(.plan or .engine or .trt)
    See more in Frequently Asked Questions — TAO Toolkit 3.22.05 documentation
    and Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation
    and Overview — TAO Toolkit 3.22.05 documentation

thanks Morganh,
the DashCamNet detect:

  • car
  • persons
  • road signs
  • bicycles.

the TrafficCamNet detect:

  • car
  • persons
  • road signs
  • two-wheelers

since I need seperate Bicycle and Motocycle, so just to make sure, I can’t directly use the existing model for my situation, correct?

About the training, from the Preparing State-of-the-Art Models for Classification and Object Detection… , the object detection training is based on ImageNet-pretrained weight, since training ImageNet model from scratch is not avaible here, can I replace it with a pretrained model from NGC? then I can have a quick transfer training on it by dataset PASCAL VOC, do you have remcommendation what is the base model i should refer(start transfer learning from it)?

For your case, if you need separate Bicycle and Motocycle , you can run retraining using the existing pretrained models from ngc.

For above dashcamnet or trafficcamnet unpruned model, please use detectnet_v2 network and resnet18 backbone.

You can also use detectnet_v2 network and other backbones to train VOC dataset from scratch.

More, it is also available for you to train yolo_v4 or other network against VOC dataset from scratch.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.