Classificaction_pyt notebook train error

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) A6000
• Network Type (Classification/pyt)
• Training spec file tao_tutorials/notebooks/tao_launcher_starter_kit/classification_pyt/classification.ipynb at main · NVIDIA/tao_tutorials · GitHub
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Having issue trying to train the example notebook. I have followed steps exactly. I have a conda env with python 3.8. I have also tested with an env with python 3.10

RandomResizedCrop.init() got an unexpected keyword argument ‘size’
Error executing job with overrides: [‘results_dir=/results/classification_experiment’, ‘model.init_cfg.checkpoint=/workspace/tao-experiments/pretrained_fan_hybrid_small/pretrained_fan_classification_imagenet_vfan_hybrid_small/fan_hybrid_small.pth’, ‘train.train_config.runner.max_epochs=3’]

/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/classification/scripts/train.py FAILED

Could you please share the full log and the training spec file?

Note: all this may be fixed now. The version of notebooks I cloned had this issue. I checked the repo today and now the spec file is fixed! Really weird or a quick response to fixing it. The “size” flag is removed for the “scale” one and the RandomFlip now has the correct flag “prob”!

I found a work around by changing the spec file. Researching this issue I found that 1) this pytorch container is built using deprecated mmdetection which may be the issue? 2) I changed the transform flags in the spec file to match an older version of mmpretrain flags. It seems that “size” is not a flag parameter and is under the “scale” now. Also RandomFlip had the wrong flag too.

train:
exp_config:
manual_seed: 49
train_config:
runner:
max_epochs: 40
checkpoint_config:
interval: 1
logging:
interval: 500
validate: True
evaluation:
interval: 1
custom_hooks:
- type: “EMAHook”
momentum: 4e-5
priority: “ABOVE_NORMAL”
dataset:
data:
samples_per_gpu: 8
train:
data_prefix: /data/cats_dogs_dataset/training_set/training_set/
pipeline: # Augmentations alone
> - type: RandomResizedCrop

      #size: 224 #, 224
      scale: 224
      backend:  "pillow"
    - type: RandomFlip
      prob: 0.5
      direction: "horizontal"
  classes: /data/cats_dogs_dataset/classes.txt
val:
  data_prefix: /data/cats_dogs_dataset/val_set/val_set
  classes: /data/cats_dogs_dataset/classes.txt
test:
  data_prefix: /data/cats_dogs_dataset/val_set/val_set
  classes: /data/cats_dogs_dataset/classes.txt

model:
backbone:
type: “fan_small_12_p4_hybrid”
custom_args:
drop_path: 0.1
head:
type: “FANLinearClsHead”
custom_args:
head_init_scale: 1
num_classes: 2
loss:
type: “CrossEntropyLoss”
loss_weight: 1.0
use_soft: False

Please look at this page in the documentation and see that the spec file is incorrect

May I know which notebook did you clone?
Could you please check if tao_tutorials/notebooks/tao_launcher_starter_kit/classification_pyt/specs/train_cats_dogs.yaml at main · NVIDIA/tao_tutorials · GitHub works?

For the mismatching between doc and notebook’s spec file, I will create a bug for tracking.

1 Like

yes that was it! thanks