Pruning Criterion

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Classification TF2
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) v5.0.0
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I want to understand how pruning works in these classification models like MobilenetV2 or EfficientNet_B0

I used threshold as 0.68. That means for a conv2d layer, atleast 68% filters layers will be removed in increments of granularity (8) while making sure atleast min filters (16) are retained. Is that right?

And if X filters are going to be removed per layer containing 32 filters, then the filters are sorted in the order of magnitude of L2 norm and top X filters out of 32 should be retained right?

My question is, when I export the trained model and the pruned model (before re-training) and compare them, the filters that are retained in the pruned version are not the top X necessarily. What I have observed is while most of them are, there are some towards the end that are nit supposed to be retained just based on the L2 norm I calculated, but have been retained. Is there any gap in my understanding? If no, what am I missing? Does it have anything to do with batch normalization?

Any update about this?

For pruning of classification_tf2, there are source code in and

I read through it, and by the looks of it, says L2 norm to be calculated for each filter.

But when I try to test that out for dense and pruned model, I see that the filters being kept are not necessarily the TopK based on L2 norm, which is what has me confused. Any insights on that would be helpful

My process -

  1. Export dense model and pruned model to ONNX and save the weights for the same Conv2D layer.
  2. By looking at the biases retained, figure out which filters have been removed and make a list of the indices.
  3. Arrange indices np.argsort(np.sqrt(np.sum(conv2D_weights**2, axis=tuple(range(1,conv2D_weights.ndim))))) and check if the topK correspond to the actual list of indices. And it doesn’t.

I can also upload the models if necessary, but I have tried it with Efficientnet B0 and MobilenetV2 on PASCAL VOC pruned with 0.68 threshold and default pruning params elsewhere

Yes, could you please share the models? And also it is appreciated that you can share more detailed steps for your finding. For example, step2, the result when you “make a list of the indices”. Thanks.

Sure here are the models, mobilenetV2 and it’s pruned version both in .tlt and onnx format.
mobilenet_v2_bn_080.tlt (16.4 MB)
mobilenet_v2_pruned.onnx (491.1 KB)
mobilenet_v2.onnx (8.6 MB)
model_th=0.68_eq=union.tlt|attachment (739.0 KB)

Now I visualize the ONNX model using Netron and try to see which filter indices have been retained after pruning. I used 0.68 as the pruning threshold.

Comparing these two, the filter indices retained are 3,4,5,6,7,8,13,14,17,18,19,20,21,22,25,28

After that, I save the weights of these models are dense.npy and prune.npy, because I read here that only weights are used to calculate the L2 norm for the filters.
dense.npy (3.5 KB)
prune.npy (1.8 KB)

Now when I calculate the norm of the filters using the same formula as the one in the codebase, I arrange the indices of the filters based on it.

The Top 16 in this case, aren’t the same as the list of actual filters retained. 15,0,27 should’ve been retained but 3,22,19 are instead. I am trying to understand what is happening in this case.

Any update on this?

With your steps, I can reproduce the behavior. Need to check further.
Did you ever save other layer’s npy and check if there are the same behavior ?

I had tried it for a Depthwise Conv2D and same result. But it shouldn’t be the case even for the first Conv2D layer right?

Any updates on this?

Could you please also go though again and check the indices? For example,

I added a print statement to print the indices and the indices retained are 3,4,5,6,7,8,13,14,17,18,19,20,21,22,25,28, which are the exact same indices as I can see in the Netron visualization of the ONNX version of the model

Could you please share more info about this? Thanks.


I’ve added this print statement here in the code at nvidia_tao_tf2 > model_optimization > pruning >

and this is the a part of the output

The third row matches exactly with the indices. I’m not exactly sure what the first 2 rows are?

Will check further.

Other rows are retrained_idx for other layers. Refer to

But were you able to check the discrepancy?

Revisiting the original question, for the conv1 layer, please print the info for
The result matches the pruned model’s retained indices.

Okay, but then why is it different from the printout in as well as what I see in netron? Any idea?

L313 is still getting the indices during explored stage. L728 will get the retained idx when the explored stage is done.
When print L738, the indices matches the idx that I see in the netron.