Not getting good result while training model with resnet-10 and resnet-18 using TLT

I have trained my model on the following classes (car,bus,bike,truck,lcv,van) with resnet-10 and resnet-18 as well but I am not getting good results, many time I got b-box of bus,truck,lcv,van on a single class like on truck using tlt-infer.
I have seen that model which is used in DS (resnet10.caffemodel -> Primary_detector) has class (car,person,cycle,road-sign) but when I run it on bus, truck, lcv it also detect these class as car. So can I consider car,bus,truck,lcv,van in a single category like vehicle while training detector with resnet-10 and then implement classifier to classify vehicle as bus,car,truck,van,lcv so this is right approach or not ? please help me out.

and please let me know what are the best practices to train resnet-10 model with transfer-learning-toolkit framework to get more accurate results.
what minimum data-size we opt ? what should be the resolution of images ? what should epoch value we need to take ? and any other things which are needed to get accurate result from training . please suggest.

Any help will be appreciated, Thanks.

DS detector trained 4 classes (car,person,cycle,road-sign). Something similar to car will be mapped to class car. So if you use it to do inference on your own images, it is reasonable for it to detect bus/truck/lcv/van as car.

Determining the class quantities depends on your requirement. If you want to train a detector which can detect car and truck, you should train 2 classes. If you want to train a detector which can detect car,truck and bus, you should train 3 classes. If you think the truck and bus are similar, you can map truck and bus to only one class, then you should train only two classes.

For training mAP, there are many factors. For example,

  1. training spec and parameter tuning
  2. training images(quantities, resolution, etc)
  3. check labels

Thanks Morganh for your response,
is there any standard size of image which we need to take like (1280*720) during training. ? and how much minimum quantity of single class is required like 10k or more ?

There is not a standard size of image for training. If your images are 1280x720, you can try to set this resolution during training. Or resize image/labels to a smaller resolution to trigger more experiments.

For images quantity, firstly, you can select parts of you dataset, trigger lots of experiments in order to tune better hyperparameter. Then increase dataset gracefully.