How to retrain the model after pruning?

In the documentation, there is only the instruction that the model needs to be retrained after pruning, but there are no details as to how retraining a model is different from the initial training (it just says to see the section on training the model, which only describes the initial training process). My assumption is that we may need to update the training spec or use a separate one that has different settings for the pretrained model weights, number of layers, etc. Can anyone provide some clarity on this process?

Also, the documentation advises that “NVIDIA recommends changing the threshold to keep the number of parameters in the model to within 10-20% of the original unpruned model”. I assume that this relates to the “-pth” optional argument, but it’s not clear how to use this to achieve 10-20% of the original unpruned model’s number of parameters. Can someone enlighten me?

Thanks in advance for any insight, comments, or suggestions!

Hi monocongo,

  1. You can refer to the train spec and retrain spec from one jupyter notebook. Compare them in order to get more info.

Thanks, Morgan. You may want to mention this to the people responsible for the TLT documentation, as it seems like something you’d want to spell out clearly and not make users hunt around for like this. (I mention this because much of what I’m doing is documenting this process for our users, so I’m hyper-focused on documentation, and this seems to be an obvious omission that I assume other users would also like to have elucidated).

1 Like

Hi monocongo,
Sorry for the inconvenient. I will sync with internal team to polish tlt document. Thanks for your suggestion.

Hi monnocongo,
2) After running “tlt-prune”, you can see following line in the log. For example,

Pruning ratio (pruned model / original model): 0.171

That means the (total params of pruned model)/(total params of unpruned model) is 17.1%.

To get the 10%-20% pruned model, user should try different pruning thresholds and find the one that results in a 10%-20% pruned model. There is no explicit rules giving the exact pth that can outputs the 10%-20% pruned model, we always try it out. Usually ‘binary search’ is a better way to find the appropriate pth.

Please also note that we need to consider the mAP(mean average precision) when set the pth. After re-training the pruned models, we can see the mAP result. The mAP should not drop too much against previous unpruned result.
There is a tradeoff between pruning ratio and mAP.

1 Like

Hi Morgan,

I am a bit confused by the explanation above.

  1. The tlt documentation in the Getting Started Guide says:

" Note: NVIDIA recommends changing the threshold to keep the number of parameters in the model to within 10-20% of the original unpruned model. "

  1. The result of pruning gives:
[INFO] iva.common.magnet_prune: Pruning ratio (pruned model / original model): 0.207290698936

My questions is that this reported ratio in (2), as per your explanation, means that the pruned model has only 20% of the total parameters versus the original model.

However, the language in the Getting Started Guide seems to indicate that the number of parameters in the pruned model should not be less than 80% of the original model. (This to me would make sense, as mAP would NOT reduce significantly. However, if you reduce the parameters to 10% of the original model, the mAP will plunge!)

So I am assuming that the target ratio for pruning as reported by the CLI should actually be in the range 0.80 to 0.90.

Is this correct? Or am I way off?

Hi pushkar,
For " Note: NVIDIA recommends changing the threshold to keep the number of parameters in the model to within 10-20% of the original unpruned model. ", the document indicate the number of parameters is recommended to keep within 10%~20%. I think it does not mention “should not be less than 80%”.

So ideally

if total_parameters in original_model = 50 million
      then
      total_parameters in pruned_model = 40 million to 45 million

Right?

No, to keep the number of parameters in the model to within 10-20% of the original unpruned model, that means 5 million to 10 million.

What’s curious is that after I prune my trained model (to ~15% of original) and then run an evaluation on it I get essentially the same mAP values, whereas I would expect the accuracy to drop precipitously after such a reduction in total parameters.

Thats interesting. Because after I prune mine (Resnet10_Detectnet), my mAP goes from 67% to 1.2%

Could I ask what parameters you are using to prune?

I am using a command like this:

$ tlt-prune -pm ${TRAINED_UNPRUNED_MODEL} -o ${OUTPUT_PRUNED} -k ${NGC_API_KEY} -pth 0.0000052

I tried many values for the pth argument and in the end settled on this one, and oddly enough it turned out to be the same value used within the documentation.