[I] int8
[I] loadEngine: resnet10_int8.engine
[I] calib: resnet10_calibration.bin
[I] batch: 1
[I] iterations: 20
[I] output: output_cov/Sigmoid,output_bbox/BiasAdd
[I] useSpinWait
[I] resnet10_int8.engine has been successfully loaded.
[I] Average over 10 runs is 191.889 ms (host walltime is 192.029 ms, 99% percentile time is 192.058).
[I] Average over 10 runs is 191.845 ms (host walltime is 191.889 ms, 99% percentile time is 192.084).
[I] Average over 10 runs is 191.94 ms (host walltime is 191.987 ms, 99% percentile time is 192.217).
[I] Average over 10 runs is 191.886 ms (host walltime is 191.933 ms, 99% percentile time is 192.066).
[I] Average over 10 runs is 191.846 ms (host walltime is 191.889 ms, 99% percentile time is 191.963).
[I] Average over 10 runs is 44.4582 ms (host walltime is 44.4947 ms, 99% percentile time is 191.925).
[I] Average over 10 runs is 19.5758 ms (host walltime is 19.6063 ms, 99% percentile time is 19.612).
[I] Average over 10 runs is 19.5645 ms (host walltime is 19.597 ms, 99% percentile time is 19.58).
[I] Average over 10 runs is 19.57 ms (host walltime is 19.6024 ms, 99% percentile time is 19.5988).
[I] Average over 10 runs is 19.5688 ms (host walltime is 19.6004 ms, 99% percentile time is 19.5856).
[I] Average over 10 runs is 19.5784 ms (host walltime is 19.6093 ms, 99% percentile time is 19.6333).
[I] Average over 10 runs is 19.5812 ms (host walltime is 19.6125 ms, 99% percentile time is 19.6102).
[I] Average over 10 runs is 19.5711 ms (host walltime is 19.602 ms, 99% percentile time is 19.6002).
[I] Average over 10 runs is 19.5767 ms (host walltime is 19.6081 ms, 99% percentile time is 19.6218).
[I] Average over 10 runs is 19.5617 ms (host walltime is 19.5921 ms, 99% percentile time is 19.5978).
[I] Average over 10 runs is 19.6818 ms (host walltime is 19.7295 ms, 99% percentile time is 20.1788).
[I] Average over 10 runs is 19.585 ms (host walltime is 19.6164 ms, 99% percentile time is 19.6313).
[I] Average over 10 runs is 19.5785 ms (host walltime is 19.6101 ms, 99% percentile time is 19.6181).
[I] Average over 10 runs is 19.5814 ms (host walltime is 19.6127 ms, 99% percentile time is 19.6403).
[I] Average over 10 runs is 19.5799 ms (host walltime is 19.6115 ms, 99% percentile time is 19.6614).
Pruning ratio (pruned model / original model): 1.0
Size of the pruned model:
total 19368
-rw-r–r-- 1 root root 19829544 Nov 22 09:23 resnet10_nopool_bn_detectnet_v2_pruned.tlt
Hi rog07o4z,
Seems that your trained model did not prune because the pruning ratio is 1.0.
What’s your “-pth” value setting in tlt-prune command? Did you ever run re-training against pruned model?
The resnet10_nopool_bn_detectnet_v2_pruned.tlt you mentioned is a pruned tlt model which has not been retrained.
If you have retrained, could you also paste the size of the retrained model (i.e, resnet18_detector_pruned.tlt by default) and the size of exported etlt model(resnet18_detector.etlt)?
I’m asking this because pruned model will get higher performance than unpruned model.
I list the process for your better understanding: unpruned model → pruned → retrain → retrained model → exported etlt model
Hi,
Yes I understand the concept of pruning and I also retrained the model before applying it in the ds pipeline.
The prune-command is:
!tlt-prune -pm $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet10_detector.tlt
-o $USER_EXPERIMENT_DIR/experiment_dir_pruned/
-eq union
-pth 0.0000052
-k $KEY
Size if retrained model:
total 19368
-rw-r–r-- 1 root root 19829328 Nov 22 10:31 resnet10_detector_pruned.tlt
Size of the untrained model:
total 39268
-rw------- 1 root root 253 Nov 22 10:31 license.txt
-rw------- 1 root root 40205392 Nov 22 10:31 resnet10.hdf5
Since the number of weights decreases from 39268 to 19368, I expected that the pruning worked fine.
Hi rog07o4z,
The resnet10.hdf5 is the pre-trained model. It is not related to prune ratio.
Your prune ratio is 1.0. That means you have not pruned the trained model.
Now I can see the whole process from you .
You trained your own data and get unpruned model(not sure the size).
After pruned(pruned ratio is 1), you get 19M pruned model, resnet10_nopool_bn_detectnet_v2_pruned.tlt
After retraining, you get 19M newly retrained model, resnet10_detector_pruned.tlt
Then you export it as resnet10_detector.etlt (not sure the size)
See step 2, could you try to prune more? Refer to section 9 of tlt doc for more detailed.