Tao customizing pretrained model

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)

• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
Classification (Resnet 18)

• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
I think this is what you mean: “tao info”
Configuration of the TAO Toolkit Instance

dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]

format_version: 2.0

toolkit_version: 3.22.05

published_date: 05/25/2022

• Training spec file(If have, please share here)
The one that comes with example:

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

How to customize the fc layer of pretrained model. The provided jupyter notebook doesnt mention anything. I am looking for something like:
pretrained_model.fc = fc
where I can give my own output dimensions

or do I have to use byom? Is there a template / example of byoms pytorch resnet18) where they change the last fc layer

I think you are running the the .hdf5 pretrained model which is downloaded from ngc. The output dimensions depends on how many classes you want to classify. It is not needed to change the .hdf5 pretrained model.

If you want to use 3rd party onnx model as pretrained model in TAO classification, you can go through BYOM Converter — TAO Toolkit 3.22.05 documentation

Yes you are right. Thats what I am using. I just added few classes and the training worked. But when I ran the eval command it says there is a mismatch in the number of classes
I thought , it will be automatically taken care of based on the number of directories

Am I missing some step?
(1) I added 5 more directories in the “formatted”
(2) The data got split across test, eval and train directories ( I cam see it)
(3) training was successful
(4) eval failed

How many classes in the training folder and test folder?

The problem is solved. The trained model was not going to the classification/output directory. I connected to the dockers bin/bash and manually copied it from the workspace output to classification output. since it was picking up older model, it complained about the issue. With the newer trained model it works fine. My bad. Looks like I didn’t set up some environment variables properly. Thanks for taking a look at it.

Thanks for the info. Glad to know it is working now.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.