Maskrcnn pruning not working

joel.kunjachanvarghese · July 8, 2024, 8:35am

Please provide the following information when requesting support.

• Hardware (NVIDIA RTX 3080Ti)
• Network Type (Mask_rcnn)
• TLT Version (tao_tf1)
• Training spec file
seed: 123
use_amp: False
warmup_steps: 0

checkpoint: “/workspace/tlt-experiments/maskrcnn/pretrained_resnet50/tlt_instance_segmentation_vresnet50/resnet50.hdf5”

learning_rate_steps: “[1, 2, 3]”
learning_rate_decay_levels: “[0.1, 0.02, 0.002]”
total_steps: 120000
num_epochs: 1
num_examples_per_epoch: 100
train_batch_size: 2
eval_batch_size: 4
num_steps_per_eval: 3
momentum: 0.9
l2_weight_decay: 0.0001
l1_weight_decay: 0.0
warmup_learning_rate: 0.0001
init_learning_rate: 0.02

pruned_model_path: “/workspace/tlt-experiments/maskrcnn/pruned_model/model.tlt”

data_config{
image_size: “(832, 1344)”
augment_input_data: True
eval_samples: 5
training_file_pattern: “/workspace/tao-tf1/nvidia_tao_tf1/cv/mask_rcnn/tfrecords/"
validation_file_pattern: "/workspace/tao-tf1/nvidia_tao_tf1/cv/mask_rcnn/tfrecords/”
val_json_file: “/workspace/tao-tf1/nvidia_tao_tf1/cv/dataset/coco/annotations/instances_val2017.json”

# dataset specific parameters
num_classes: 91
skip_crowd_during_training: True
max_num_instances: 200

}

maskrcnn_config {
nlayers: 50
arch: “resnet”
freeze_bn: True
freeze_blocks: “[0,1]”
gt_mask_size: 112

# Region Proposal Network
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_min_size: 0.

# Proposal layer.
batch_size_per_im: 512
fg_fraction: 0.25
fg_thresh: 0.5
bg_thresh_hi: 0.5
bg_thresh_lo: 0.

# Faster-RCNN heads.
fast_rcnn_mlp_head_dim: 1024
bbox_reg_weights: "(10., 10., 5., 5.)"

# Mask-RCNN heads.
include_mask: True
mrcnn_resolution: 28

# training
train_rpn_pre_nms_topn: 2000
train_rpn_post_nms_topn: 1000
train_rpn_nms_threshold: 0.7

# evaluation
test_detections_per_image: 100
test_nms: 0.5
test_rpn_pre_nms_topn: 1000
test_rpn_post_nms_topn: 1000
test_rpn_nms_thresh: 0.7

# model architecture
min_level: 2
max_level: 6
num_scales: 1
aspect_ratios: "[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]"
anchor_scale: 8

# localization loss
rpn_box_loss_weight: 1.0
fast_rcnn_box_loss_weight: 1.0
mrcnn_weight_loss_mask: 1.0

}
• How to reproduce the issue ? python mask_rcnn/scripts/prune.py -m /workspace/tao-tf1/nvidia_tao_tf1/cv/mask_rcnn/results/resnet50/model.epoch-1.tlt -o /workspace/tao-tf1/nvidia_tao_tf1/cv/mask_rcnn/results/resnet50/prune

Error Log:

2024-07-08 08:18:03.429982: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
2024-07-08 08:18:04,116 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
2024-07-08 08:18:04,136 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
2024-07-08 08:18:04,138 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
2024-07-08 08:18:04,397 [TAO Toolkit] [INFO] root 2102: Starting MaskRCNN pruning.
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: MLP/multilevel_propose_rois/level_2/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: MLP/multilevel_propose_rois/level_3/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: MLP/multilevel_propose_rois/level_4/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: MLP/multilevel_propose_rois/level_5/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: MLP/multilevel_propose_rois/level_6/
2024-07-08 08:18:11.206052: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcuda.so.1
2024-07-08 08:18:12.985936: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-07-08 08:18:12.986019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1674] Found device 0 with properties:
name: NVIDIA GeForce RTX 3080 Ti Laptop GPU major: 8 minor: 6 memoryClockRate(GHz): 1.395
pciBusID: 0000:01:00.0
2024-07-08 08:18:12.986031: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-07-08 08:18:12.999542: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcublas.so.12
2024-07-08 08:18:13.000495: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcufft.so.11
2024-07-08 08:18:13.000678: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcurand.so.10
2024-07-08 08:18:13.002105: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcusolver.so.11
2024-07-08 08:18:13.002627: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcusparse.so.12
2024-07-08 08:18:13.002726: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudnn.so.8
2024-07-08 08:18:13.002781: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-07-08 08:18:13.002872: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-07-08 08:18:13.002922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1802] Adding visible gpu devices: 0
2024-07-08 08:18:13.026340: I tensorflow/core/platform/profile_utils/cpu_utils.cc:109] CPU Frequency: 2918400000 Hz
2024-07-08 08:18:13.026908: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x610eee0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2024-07-08 08:18:13.026923: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2024-07-08 08:18:13.057062: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-07-08 08:18:13.057352: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6116400 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-07-08 08:18:13.057371: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 3080 Ti Laptop GPU, Compute Capability 8.6
2024-07-08 08:18:13.057481: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-07-08 08:18:13.057574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1674] Found device 0 with properties:
name: NVIDIA GeForce RTX 3080 Ti Laptop GPU major: 8 minor: 6 memoryClockRate(GHz): 1.395
pciBusID: 0000:01:00.0
2024-07-08 08:18:13.057588: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-07-08 08:18:13.057600: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcublas.so.12
2024-07-08 08:18:13.057606: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcufft.so.11
2024-07-08 08:18:13.057612: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcurand.so.10
2024-07-08 08:18:13.057617: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcusolver.so.11
2024-07-08 08:18:13.057632: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcusparse.so.12
2024-07-08 08:18:13.057637: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudnn.so.8
2024-07-08 08:18:13.057661: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-07-08 08:18:13.057725: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-07-08 08:18:13.057774: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1802] Adding visible gpu devices: 0
2024-07-08 08:18:13.057790: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-07-08 08:18:13.061293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1214] Device interconnect StreamExecutor with strength 1 edge matrix:
2024-07-08 08:18:13.061302: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1220] 0
2024-07-08 08:18:13.061306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1233] 0: N
2024-07-08 08:18:13.061372: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-07-08 08:18:13.061464: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-07-08 08:18:13.061532: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1359] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14109 MB memory) → physical GPU (device: 0, name: NVIDIA GeForce RTX 3080 Ti Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
WARNING:tensorflow:From mask_rcnn/scripts/prune.py:164: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.

2024-07-08 08:18:14,812 [TAO Toolkit] [WARNING] tensorflow 137: From mask_rcnn/scripts/prune.py:164: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.

WARNING:tensorflow:From mask_rcnn/scripts/prune.py:165: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

2024-07-08 08:18:15,005 [TAO Toolkit] [WARNING] tensorflow 137: From mask_rcnn/scripts/prune.py:165: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

INFO:tensorflow:Restoring parameters from /tmp/tmprfr2z_o1/model.ckpt-50
2024-07-08 08:18:15,104 [TAO Toolkit] [INFO] tensorflow 1284: Restoring parameters from /tmp/tmprfr2z_o1/model.ckpt-50
2024-07-08 08:18:15,472 [TAO Toolkit] [INFO] nvidia_tao_tf1.core.pruning.pruning 981: Exploring graph for retainable indices
2024-07-08 08:18:15,473 [TAO Toolkit] [INFO] root 2102: Unknown layer type: <class ‘tensorflow.python.keras.layers.convolutional.Conv2D’>
Traceback (most recent call last):
File “mask_rcnn/scripts/prune.py”, line 265, in
main()
File “mask_rcnn/scripts/prune.py”, line 261, in main
raise e
File “mask_rcnn/scripts/prune.py”, line 249, in main
run_pruning(args)
File “mask_rcnn/scripts/prune.py”, line 213, in run_pruning
pruning_ratio, param_count = prune_graph(args, “train_graph.json”)
File “mask_rcnn/scripts/prune.py”, line 186, in prune_graph
pruned_model = prune(
File “/workspace/tao-tf1/nvidia_tao_tf1/core/pruning/pruning.py”, line 1606, in prune
return pruner.prune(model, layer_config_overrides, output_layers_with_outbound_nodes)
File “/workspace/tao-tf1/nvidia_tao_tf1/core/pruning/pruning.py”, line 1211, in prune
model = self._explore(model)
File “/workspace/tao-tf1/nvidia_tao_tf1/core/pruning/pruning.py”, line 1142, in _explore
raise NotImplementedError(“Unknown layer type: %s” % type(layer))
NotImplementedError: Unknown layer type: <class ‘tensorflow.python.keras.layers.convolutional.Conv2D’>

Morganh · July 8, 2024, 8:46am

The command is different from what we used. Could you please follow the notebook or TAO user guide to run?
Please refer to TAO user guide, see MaskRCNN - NVIDIA Docs.

You can also run without tao launcher. That means you can run as below.
$ docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 /bin/bash
Then inside the docker,
# mask_rcnn prune xxx

joel.kunjachanvarghese · July 8, 2024, 9:12am

the command has no issue i am using this command for other models in tao_tf1

Morganh · July 8, 2024, 9:21am

There is another similar topic about pruning error you created several days ago. See Error while pruning .tlt model created during efficientdet-d0 model. It is for efficientdet_tf1.
So, I suggest you to follow our official command to double check.
More, please do a quick experiment to narrow down.
Step:

Run quick training with only 1 epoch. Please create a new result folder to save the model.
Run pruning again.

yingliu · July 30, 2024, 3:40am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

system · August 13, 2024, 3:40am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error while pruning .tlt model created during efficientdet-d0 model TAO Toolkit	19	132	July 24, 2024
Tao Training Model Error TAO Toolkit	7	495	January 15, 2024
TAO Toolkit Train of an EfficientDet-D0 is stuck! TAO Toolkit	21	932	August 2, 2022
[Urgent] Can't run `tlt-evaluate faster_rcnn` for exported model TAO Toolkit	9	1004	October 12, 2021
Mask R-CNN Training Jupyter Notebook Model Quality and Multiple GPU changes TAO Toolkit	7	1943	October 12, 2021
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/getting_started_v5.0.0/notebooks/tao_launcher_starter_kit/mask_rcnn/specs/maskrc TAO Toolkit python	44	1475	September 5, 2023
TAO MaskRCNN inference output problem TAO Toolkit	36	1016	November 30, 2023
ValueError: Total size of new array must be unchanged for box_head/class-predict/kernel lh_shape: [(1024, 1)], rh_shape: [(1024, 2)] TAO Toolkit	7	946	October 12, 2021
Mask-RCNN int8 Version Results in Poor Performance TAO Toolkit	37	1005	July 6, 2022
Tao-toolkit on Cluster with GPU in EXCLUSIVE_PROCESS: CUDA runtime implicit initialization on GPU:0 failed TAO Toolkit tao	7	4489	November 17, 2021

Maskrcnn pruning not working

checkpoint: “/workspace/tlt-experiments/maskrcnn/pretrained_resnet50/tlt_instance_segmentation_vresnet50/resnet50.hdf5”

pruned_model_path: “/workspace/tlt-experiments/maskrcnn/pruned_model/model.tlt”

Related topics