Training resnet-10 model on IR data

Hi Guys,

I am building a surveillance application which requires detection to work during day and night. I wish to know if there is a way to train resnet-10 model on IR data. Also, is there a way to avoid training and still be able to get detections on IR data. Has anyone experimented with this before. Has the original resnet-10 model been trained on IR data?

Kindly help me out.

Thanks.

Hi,

Our resnet-10 model is trained on color RGB database.
You can just try it on an IR image but we expected a much lower accuracy.

To retrain the resnet-10 model, here is a toolkit called Transfer Learning Toolkit(TLT).
https://developer.nvidia.com/transfer-learning-toolkit

Thanks.

Hi AastaLLL,

I just downloaded Transfer Learning Toolkit and I found it to be very useful. Kudos to NVIDIA for the great work. In the specs file I could see some augmentations such as color and spatial. I wish to simulate IR data by converted the color images to gray scale and feeding it for training. Is it possible through online agumentation?

Thanks.

Hi,

If you convert the IR data into the supported format, ex. RGB or gray, it should be workable.

Thanks.

Hi AastaLLL,

Could you please guide on how to convert the IR data to supported format ?

Thanks.

Hi,

It’s recommended to check your IR camera provider first.
Suppose you can find some solution from the provider.

Thanks.

What would be the next best thing to do ? I will be mostly using day/night cameras. I tried retraining tlt model but I am not able to get results close to that of deepstream default’s !!

Hi neophyte1,
As we synced offline, I have experimented with a public Thermal IR dataset, which are 1 channel images, to train detectnet_v2, we achieved a good result.
This explains well

  • TLT’s capability to train on 1-channel images.
  • TLT’s capability to have a good AP on Thermal IR images
  • TLT’s usages of class weight

Please find detailed information below:
Dataset:
Thermal IR dataset: https://www.flir.asia/oem/adas/adas-dataset-form/
Size: 640x512 jpegs
Train: 6760 images
Val: 1100 images
Objects:
Car: 41260
Person: 22371

Spec:
Modify parameters in spec file:
car_class_weight:1.0
person_class_weight:2.0
batch_size_per_gpu: 4
num_epochs: 250
output_image_channel: 1

mAP
Mean average_precision (in %): 77.9860
class name average precision (in %)


car 72.9039
person 83.0681

Hi, Really appreciate the quick response. As FLIR dataset primarily concerns Thermal/FAR-IR. Can we expect a similar kind of results for Near IR which is commonly seen in day/night cameras. Another important thing is I am having hard time understanding all the parameters of the spec file. I went through the tlt doc. While some of the parameters are easier to understand - like class weight etc. Other parameters - for example - coverage theshold, cov_center_x etc. seem a little difficult to get an intuitive understanding of. Can you shed more light on this ?

Hi neophyte1,
Do you know if there is a public dataset similar to your mentioned “Near IR”? I can do more experiments for it.

For cov_center_x , see from tlt user guide,

cov_center_x (float): x-coordinate of the center of object.
cov_center_y (float): y-coordinate of the center of object.

For coverage_threshold:

Grid cells with coverage lower than this threshold will be ignored. Valid range [0.0, 1.0].

More:

dbscan_eps (float): DBSCAN eps parameter. The maximum distance between two samples
                for them to be considered as in the same neighborhood. Valid range [0.0, 1.0].
dbscan_min_samples (float): DBSCAN min samples parameter. The number of samples (or
                total weight) in a neighborhood for a point to be considered as a core point.
                This includes the point itself. Must be >= 0.0.
minimum_bounding_box_height (int): Minimum bbox height. Must be >= 0.