Please provide the following information when requesting support.
• Hardware RTX39090
• Network Type Mask RCNN
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 2.0
toolkit_version: 3.21.11
published_date: 11/08/2021
• Training spec file(If have, please share here)
the training spec file: maskrcnn_train_resnet50.txt (2.0 KB)
the re-train spec file: maskrcnn_retrain_resnet50.txt (2.0 KB)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
I am trying the mask rcnn instance segmentation with TAO toolkit.
I followed the maskrcnn.ipynb to train with 1 class dataset, the tf_record dataset is generated with COCO format by scripts: preprocess_dataset.sh (4.5 KB) create_inviol_tf_record.py (12.9 KB)
First training and evaluation is basic OK.
When I finished the pruning, and started retrain the model , run with:
!tao mask_rcnn train -e $SPECS_DIR/maskrcnn_retrain_resnet50.txt
-d $USER_EXPERIMENT_DIR/experiment_dir_retrain
-k $KEY
–gpus 1
it showed that "
ValueError: Cannot reshape a tensor with 25690112 elements to shape [128,256,14,14] (6422528 elements) for ‘mask_head_reshape_1/mask_head_reshape_1’ (op: ‘Reshape’) with input shapes: [4,128,256,14,14], [4] and with input tensors computed as partial shapes: input[1] = [128,256,14,14]."
Once the model has been pruned, there might be a decrease in accuracy. This happens because some previously useful weights may have been removed. To regain accuracy, NVIDIA recommends that you retrain this pruned model over the same dataset. To do this, run the tao mask_rcnn train command with an updated spec file that points to the newly pruned model by setting pruned_model_path .
Users are advised to turn off the regularizer during retraining. You may do this by setting the regularizer weights to 0 for both l1_weight_decay and l2_weight_decay .
The other parameters may be retained in the spec file from the previous training. train_batch_size and eval_batch_size must be kept unchanged.