Please provide the following information when requesting support.
Having issue trying to train the example notebook. I have followed steps exactly. I have a conda env with python 3.8. I have also tested with an env with python 3.10
RandomResizedCrop.init() got an unexpected keyword argument ‘size’
Error executing job with overrides: [‘results_dir=/results/classification_experiment’, ‘model.init_cfg.checkpoint=/workspace/tao-experiments/pretrained_fan_hybrid_small/pretrained_fan_classification_imagenet_vfan_hybrid_small/fan_hybrid_small.pth’, ‘train.train_config.runner.max_epochs=3’]
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/classification/scripts/train.py FAILED
Could you please share the full log and the training spec file?
Note: all this may be fixed now. The version of notebooks I cloned had this issue. I checked the repo today and now the spec file is fixed! Really weird or a quick response to fixing it. The “size” flag is removed for the “scale” one and the RandomFlip now has the correct flag “prob”!
I found a work around by changing the spec file. Researching this issue I found that 1) this pytorch container is built using deprecated mmdetection which may be the issue? 2) I changed the transform flags in the spec file to match an older version of mmpretrain flags. It seems that “size” is not a flag parameter and is under the “scale” now. Also RandomFlip had the wrong flag too.
train:
exp_config:
manual_seed: 49
train_config:
runner:
max_epochs: 40
checkpoint_config:
interval: 1
logging:
interval: 500
validate: True
evaluation:
interval: 1
custom_hooks:
- type: “EMAHook”
momentum: 4e-5
priority: “ABOVE_NORMAL”
dataset:
data:
samples_per_gpu: 8
train:
data_prefix: /data/cats_dogs_dataset/training_set/training_set/
pipeline: # Augmentations alone
> - type: RandomResizedCrop
#size: 224 #, 224
scale: 224
backend: "pillow"
- type: RandomFlip
prob: 0.5
direction: "horizontal"
classes: /data/cats_dogs_dataset/classes.txt
val:
data_prefix: /data/cats_dogs_dataset/val_set/val_set
classes: /data/cats_dogs_dataset/classes.txt
test:
data_prefix: /data/cats_dogs_dataset/val_set/val_set
classes: /data/cats_dogs_dataset/classes.txt
model:
backbone:
type: “fan_small_12_p4_hybrid”
custom_args:
drop_path: 0.1
head:
type: “FANLinearClsHead”
custom_args:
head_init_scale: 1
num_classes: 2
loss:
type: “CrossEntropyLoss”
loss_weight: 1.0
use_soft: False
Please look at this page in the documentation and see that the spec file is incorrect
May I know which notebook did you clone?
Could you please check if tao_tutorials/notebooks/tao_launcher_starter_kit/classification_pyt/specs/train_cats_dogs.yaml at main · NVIDIA/tao_tutorials · GitHub works?
For the mismatching between doc and notebook’s spec file, I will create a bug for tracking.
1 Like