Tlt 3.0

Thank you for your reply.


However I faced another error in reshaping the tensor.

Can you share your training spec file?

You want to train a 736x416 detectnet_v3 model.
According to Transfer Learning Toolkit — Transfer Learning Toolkit 3.0 documentation
Please check if your images/labels are resized to 736x416 offline.

Reference: Tensor reshape error when evaluating a Detectnet_v2 model

Yes I checked the link and all my images have the same size:

Since you set the facedetectIR tlt model as the pretrained model, please resize your images/labels to 384x240 offline.
And set 380x240 in your training spec file.

https://ngc.nvidia.com/catalog/models/nvidia:tlt_facedetectir

Input

Gray Image whose values in RGB channels are the same. 384 X 240 X 3 (W x H x C) Channel Ordering of the Input: NCHW, where N = Batch Size, C = number of channels (3), H = Height of images (240), W = Width of the images (384)

Thank you Morganh for your support.
Concerning the load_key. I set it to tlt_encode. However I am still having issues even though I used the facedetect model not the IR.


What do you mean by “I used the facedetect model not the IR” ?

I am working on the facenet model (face detect) provided by the transfer learning toolkit.
For the pretrained model, there are 2 version either with IR or without from ngc.

So using the pretrained model without IR give me the OSError: Unable to open file. And using the pretrained model with IR give me the dimensions error. I will resize the images as you said. I will resize them offline 384x240 images and see if it will work.
However meanwhile, I am trying with the pretrained model without IR.

For https://ngc.nvidia.com/catalog/models/nvidia:tlt_facenet, according to its overview,

its key is nvidia_tlt

Input

Grayscale Image whose values in RGB channels are the same. 736 X 416 X 3

For https://ngc.nvidia.com/catalog/models/nvidia:tlt_facedetectir, according to its overview,

its key is tlt_encode

Input

Gray Image whose values in RGB channels are the same. 384 X 240 X 3 (W x H x C) Channel Ordering of the Input: NCHW, where N = Batch Size, C = number of channels (3), H = Height of images (240), W = Width of the images (384)

1 Like

Ok thank you for your reply.
I just want to make sure if the dataset of the face detect is the same as the face detect with IR but resized.
Thank you so much !

It shouldn’t be since the label is different.
“This model accepts 384x240x3 dimension input tensors and outputs 24x15x4 bbox coordinate tensor and 24x15x1 class confidence tensor.”

Any remaining issue?

I converted the dataset into 384x240 images. However when I trained the model, another error occurred:



Please remove result folder and retry.
Reference: DetectNet v2 training error - "ValueError: The zipfile extracted was corrupt. Please check your key "

I don’t think I am following up with you.
I can’t remove the zip.python file.
Could you provide me with more details please?

I mean you can remove the experiment_dir_unpruned folder and retry.
Please back up it firstly.

Thank you so much for your continuous help it is working now.