TAO-5 Mask-rcnn converting tlt to uff instead of onnx

Pritam · January 9, 2025, 1:54pm

I am training a model using Mask R-CNN. After training the model, I am trying to convert the .tlt file to .onnx, but it is generating a .uff file instead. I also tried using --onnx_route tf2onnx in the export command, but it didn’t work. Could you please suggest how we can directly get the .onnx file? I am unable to convert .uff to .onnx.

I am using TAO version 5.5 on a machine with an NVIDIA 2080 Ti GPU and NVIDIA driver version 535.183.01.

# tao <task> export will fail if .onnx already exists. So we clear the export folder before tao <task> export
#!rm -rf $LOCAL_EXPERIMENT_DIR/export
#!mkdir -p $LOCAL_EXPERIMENT_DIR/export 

# Generate .onnx file using tao container
!tao model mask_rcnn export -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/model.epoch-5.tlt \
                      -e $SPECS_DIR/maskrcnn_train_resnet10.txt \
                      --gen_ds_config

2025-01-09 18:16:56,921 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2025-01-09 18:16:56,978 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2025-01-09 18:16:57,006 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
2025-01-09 12:46:57.547332: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2025-01-09 12:46:57,581 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2025-01-09 12:46:58.734242: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
Using TensorFlow backend.
2025-01-09 12:46:58,826 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2025-01-09 12:46:58,848 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2025-01-09 12:46:58,851 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2025-01-09 12:46:59,067 [TAO Toolkit] [WARNING] matplotlib 500: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-ce77zrex because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2025-01-09 12:46:59,200 [TAO Toolkit] [INFO] matplotlib.font_manager 1633: generated new fontManager
2025-01-09 12:46:59.746133: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libnvinfer.so.8
2025-01-09 12:46:59.757967: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcuda.so.1
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2025-01-09 12:47:01,131 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2025-01-09 12:47:01,153 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2025-01-09 12:47:01,156 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2025-01-09 12:47:01,473 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.export.app 264: Saving exported model to /workspace/tao-experiments/mask_rcnn/experiment_dir_unpruned/model.epoch-5.uff
2025-01-09 12:47:01,473 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.mask_rcnn.utils.spec_loader 47: Loading specification from /workspace/tao-experiments/mask_rcnn/specs/maskrcnn_train_resnet10.txt
2025-01-09 12:47:01,474 [TAO Toolkit] [INFO] root 2082: Loading weights from /workspace/tao-experiments/mask_rcnn/experiment_dir_unpruned/model.epoch-5.tlt
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpyu1yaprq', '_tf_random_seed': 123, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  allow_growth: true
  force_gpu_compatible: true
}
allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: TWO
  }
}
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': None, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7cfe22221610>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
2025-01-09 12:47:01,705 [TAO Toolkit] [INFO] tensorflow 212: Using config: {'_model_dir': '/tmp/tmpyu1yaprq', '_tf_random_seed': 123, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  allow_growth: true
  force_gpu_compatible: true
}
allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: TWO
  }
}
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': None, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7cfe22221610>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Create CheckpointSaverHook.
2025-01-09 12:47:01,705 [TAO Toolkit] [INFO] tensorflow 541: Create CheckpointSaverHook.
[MaskRCNN] INFO    : [*] Limiting the amount of sample to: 5000
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/third_party/keras/tensorflow_backend.py:361: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2025-01-09 12:47:01,745 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/third_party/keras/tensorflow_backend.py:361: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

2025-01-09 12:47:01,752 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
2025-01-09 12:47:02,434 [TAO Toolkit] [WARNING] tensorflow 1776: The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
2025-01-09 12:47:02,436 [TAO Toolkit] [WARNING] tensorflow 1776: The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
2025-01-09 12:47:02,439 [TAO Toolkit] [WARNING] tensorflow 1776: The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
2025-01-09 12:47:02,442 [TAO Toolkit] [WARNING] tensorflow 1776: The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
INFO:tensorflow:Calling model_fn.
2025-01-09 12:47:03,051 [TAO Toolkit] [INFO] tensorflow 1148: Calling model_fn.
[MaskRCNN] INFO    : ***********************
[MaskRCNN] INFO    : Loading model graph...
[MaskRCNN] INFO    : ***********************
[MaskRCNN] INFO    : [ROI OPs] Using Batched NMS... Scope: MLP/multilevel_propose_rois/level_2/
[MaskRCNN] INFO    : [ROI OPs] Using Batched NMS... Scope: MLP/multilevel_propose_rois/level_3/
[MaskRCNN] INFO    : [ROI OPs] Using Batched NMS... Scope: MLP/multilevel_propose_rois/level_4/
[MaskRCNN] INFO    : [ROI OPs] Using Batched NMS... Scope: MLP/multilevel_propose_rois/level_5/
[MaskRCNN] INFO    : [ROI OPs] Using Batched NMS... Scope: MLP/multilevel_propose_rois/level_6/
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
image_input (ImageInput)        [(8, 3, 640, 640)]   0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (8, 64, 320, 320)    9408        image_input[0][0]                
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (8, 64, 320, 320)    256         conv1[0][0]                      
__________________________________________________________________________________________________
activation (Activation)         (8, 64, 320, 320)    0           bn_conv1[0][0]                   
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D)    (8, 64, 160, 160)    0           activation[0][0]                 
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D)        (8, 64, 160, 160)    36864       max_pooling2d[0][0]              
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (8, 64, 160, 160)    256         block_1a_conv_1[0][0]            
__________________________________________________________________________________________________
block_1a_relu_1 (Activation)    (8, 64, 160, 160)    0           block_1a_bn_1[0][0]              
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D)        (8, 64, 160, 160)    36864       block_1a_relu_1[0][0]            
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (8, 64, 160, 160)    4096        max_pooling2d[0][0]              
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (8, 64, 160, 160)    256         block_1a_conv_2[0][0]            
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (8, 64, 160, 160)    256         block_1a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add (Add)                       (8, 64, 160, 160)    0           block_1a_bn_2[0][0]              
                                                                 block_1a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_1a_relu (Activation)      (8, 64, 160, 160)    0           add[0][0]                        
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D)        (8, 128, 80, 80)     73728       block_1a_relu[0][0]              
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (8, 128, 80, 80)     512         block_2a_conv_1[0][0]            
__________________________________________________________________________________________________
block_2a_relu_1 (Activation)    (8, 128, 80, 80)     0           block_2a_bn_1[0][0]              
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D)        (8, 128, 80, 80)     147456      block_2a_relu_1[0][0]            
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (8, 128, 80, 80)     8192        block_1a_relu[0][0]              
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (8, 128, 80, 80)     512         block_2a_conv_2[0][0]            
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (8, 128, 80, 80)     512         block_2a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_1 (Add)                     (8, 128, 80, 80)     0           block_2a_bn_2[0][0]              
                                                                 block_2a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_2a_relu (Activation)      (8, 128, 80, 80)     0           add_1[0][0]                      
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D)        (8, 256, 40, 40)     294912      block_2a_relu[0][0]              
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (8, 256, 40, 40)     1024        block_3a_conv_1[0][0]            
__________________________________________________________________________________________________
block_3a_relu_1 (Activation)    (8, 256, 40, 40)     0           block_3a_bn_1[0][0]              
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D)        (8, 256, 40, 40)     589824      block_3a_relu_1[0][0]            
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (8, 256, 40, 40)     32768       block_2a_relu[0][0]              
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (8, 256, 40, 40)     1024        block_3a_conv_2[0][0]            
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (8, 256, 40, 40)     1024        block_3a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_2 (Add)                     (8, 256, 40, 40)     0           block_3a_bn_2[0][0]              
                                                                 block_3a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_3a_relu (Activation)      (8, 256, 40, 40)     0           add_2[0][0]                      
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D)        (8, 512, 20, 20)     1179648     block_3a_relu[0][0]              
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (8, 512, 20, 20)     2048        block_4a_conv_1[0][0]            
__________________________________________________________________________________________________
block_4a_relu_1 (Activation)    (8, 512, 20, 20)     0           block_4a_bn_1[0][0]              
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D)        (8, 512, 20, 20)     2359296     block_4a_relu_1[0][0]            
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (8, 512, 20, 20)     131072      block_3a_relu[0][0]              
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (8, 512, 20, 20)     2048        block_4a_conv_2[0][0]            
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (8, 512, 20, 20)     2048        block_4a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_3 (Add)                     (8, 512, 20, 20)     0           block_4a_bn_2[0][0]              
                                                                 block_4a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_4a_relu (Activation)      (8, 512, 20, 20)     0           add_3[0][0]                      
__________________________________________________________________________________________________
l5 (Conv2D)                     (8, 256, 20, 20)     131328      block_4a_relu[0][0]              
__________________________________________________________________________________________________
l4 (Conv2D)                     (8, 256, 40, 40)     65792       block_3a_relu[0][0]              
__________________________________________________________________________________________________
FPN_up_4 (UpSampling2D)         (8, 256, 40, 40)     0           l5[0][0]                         
__________________________________________________________________________________________________
FPN_add_4 (Add)                 (8, 256, 40, 40)     0           l4[0][0]                         
                                                                 FPN_up_4[0][0]                   
__________________________________________________________________________________________________
l3 (Conv2D)                     (8, 256, 80, 80)     33024       block_2a_relu[0][0]              
__________________________________________________________________________________________________
FPN_up_3 (UpSampling2D)         (8, 256, 80, 80)     0           FPN_add_4[0][0]                  
__________________________________________________________________________________________________
FPN_add_3 (Add)                 (8, 256, 80, 80)     0           l3[0][0]                         
                                                                 FPN_up_3[0][0]                   
__________________________________________________________________________________________________
l2 (Conv2D)                     (8, 256, 160, 160)   16640       block_1a_relu[0][0]              
__________________________________________________________________________________________________
FPN_up_2 (UpSampling2D)         (8, 256, 160, 160)   0           FPN_add_3[0][0]                  
__________________________________________________________________________________________________
FPN_add_2 (Add)                 (8, 256, 160, 160)   0           l2[0][0]                         
                                                                 FPN_up_2[0][0]                   
__________________________________________________________________________________________________
post_hoc_d5 (Conv2D)            (8, 256, 20, 20)     590080      l5[0][0]                         
__________________________________________________________________________________________________
post_hoc_d2 (Conv2D)            (8, 256, 160, 160)   590080      FPN_add_2[0][0]                  
__________________________________________________________________________________________________
post_hoc_d3 (Conv2D)            (8, 256, 80, 80)     590080      FPN_add_3[0][0]                  
__________________________________________________________________________________________________
post_hoc_d4 (Conv2D)            (8, 256, 40, 40)     590080      FPN_add_4[0][0]                  
__________________________________________________________________________________________________
p6 (MaxPooling2D)               (8, 256, 10, 10)     0           post_hoc_d5[0][0]                
__________________________________________________________________________________________________
rpn (Conv2D)                    multiple             590080      post_hoc_d2[0][0]                
                                                                 post_hoc_d3[0][0]                
                                                                 post_hoc_d4[0][0]                
                                                                 post_hoc_d5[0][0]                
                                                                 p6[0][0]                         
__________________________________________________________________________________________________
rpn-class (Conv2D)              multiple             771         rpn[0][0]                        
                                                                 rpn[1][0]                        
                                                                 rpn[2][0]                        
                                                                 rpn[3][0]                        
                                                                 rpn[4][0]                        
__________________________________________________________________________________________________
rpn-box (Conv2D)                multiple             3084        rpn[0][0]                        
                                                                 rpn[1][0]                        
                                                                 rpn[2][0]                        
                                                                 rpn[3][0]                        
                                                                 rpn[4][0]                        
__________________________________________________________________________________________________
permute (Permute)               (8, 160, 160, 3)     0           rpn-class[0][0]                  
__________________________________________________________________________________________________
permute_2 (Permute)             (8, 80, 80, 3)       0           rpn-class[1][0]                  
__________________________________________________________________________________________________
permute_4 (Permute)             (8, 40, 40, 3)       0           rpn-class[2][0]                  
__________________________________________________________________________________________________
permute_6 (Permute)             (8, 20, 20, 3)       0           rpn-class[3][0]                  
__________________________________________________________________________________________________
permute_8 (Permute)             (8, 10, 10, 3)       0           rpn-class[4][0]                  
__________________________________________________________________________________________________
permute_1 (Permute)             (8, 160, 160, 12)    0           rpn-box[0][0]                    
__________________________________________________________________________________________________
permute_3 (Permute)             (8, 80, 80, 12)      0           rpn-box[1][0]                    
__________________________________________________________________________________________________
permute_5 (Permute)             (8, 40, 40, 12)      0           rpn-box[2][0]                    
__________________________________________________________________________________________________
permute_7 (Permute)             (8, 20, 20, 12)      0           rpn-box[3][0]                    
__________________________________________________________________________________________________
permute_9 (Permute)             (8, 10, 10, 12)      0           rpn-box[4][0]                    
__________________________________________________________________________________________________
anchor_layer (AnchorLayer)      OrderedDict([(2, (16 0           image_input[0][0]                
__________________________________________________________________________________________________
info_input (InfoInput)          [(8, 5)]             0                                            
__________________________________________________________________________________________________
MLP (MultilevelProposal)        ((8, 1000), (8, 1000 0           permute[0][0]                    
                                                                 permute_2[0][0]                  
                                                                 permute_4[0][0]                  
                                                                 permute_6[0][0]                  
                                                                 permute_8[0][0]                  
                                                                 permute_1[0][0]                  
                                                                 permute_3[0][0]                  
                                                                 permute_5[0][0]                  
                                                                 permute_7[0][0]                  
                                                                 permute_9[0][0]                  
                                                                 anchor_layer[0][0]               
                                                                 anchor_layer[0][1]               
                                                                 anchor_layer[0][2]               
                                                                 anchor_layer[0][3]               
                                                                 anchor_layer[0][4]               
                                                                 info_input[0][0]                 
__________________________________________________________________________________________________
multilevel_crop_resize (Multile (8, 1000, 256, 7, 7) 0           post_hoc_d2[0][0]                
                                                                 post_hoc_d3[0][0]                
                                                                 post_hoc_d4[0][0]                
                                                                 post_hoc_d5[0][0]                
                                                                 p6[0][0]                         
                                                                 MLP[0][1]                        
__________________________________________________________________________________________________
box_head_reshape1 (ReshapeLayer (8000, 12544)        0           multilevel_crop_resize[0][0]     
__________________________________________________________________________________________________
fc6 (Dense)                     (8000, 1024)         12846080    box_head_reshape1[0][0]          
__________________________________________________________________________________________________
fc7 (Dense)                     (8000, 1024)         1049600     fc6[0][0]                        
__________________________________________________________________________________________________
class-predict (Dense)           (8000, 2)            2050        fc7[0][0]                        
__________________________________________________________________________________________________
box-predict (Dense)             (8000, 8)            8200        fc7[0][0]                        
__________________________________________________________________________________________________
box_head_reshape2 (ReshapeLayer (8, 1000, 2)         0           class-predict[0][0]              
__________________________________________________________________________________________________
box_head_reshape3 (ReshapeLayer (8, 1000, 8)         0           box-predict[0][0]                
__________________________________________________________________________________________________
gpu_detections (GPUDetections)  ((8,), (8, 100, 4),  0           box_head_reshape2[0][0]          
                                                                 box_head_reshape3[0][0]          
                                                                 MLP[0][1]                        
                                                                 info_input[0][0]                 
__________________________________________________________________________________________________
multilevel_crop_resize_1 (Multi (8, 100, 256, 14, 14 0           post_hoc_d2[0][0]                
                                                                 post_hoc_d3[0][0]                
                                                                 post_hoc_d4[0][0]                
                                                                 post_hoc_d5[0][0]                
                                                                 p6[0][0]                         
                                                                 gpu_detections[0][1]             
__________________________________________________________________________________________________
mask_head_reshape_1 (ReshapeLay (800, 256, 14, 14)   0           multilevel_crop_resize_1[0][0]   
__________________________________________________________________________________________________
mask-conv-l0 (Conv2D)           (800, 256, 14, 14)   590080      mask_head_reshape_1[0][0]        
__________________________________________________________________________________________________
mask-conv-l1 (Conv2D)           (800, 256, 14, 14)   590080      mask-conv-l0[0][0]               
__________________________________________________________________________________________________
mask-conv-l2 (Conv2D)           (800, 256, 14, 14)   590080      mask-conv-l1[0][0]               
__________________________________________________________________________________________________
mask-conv-l3 (Conv2D)           (800, 256, 14, 14)   590080      mask-conv-l2[0][0]               
__________________________________________________________________________________________________
conv5-mask (Conv2DTranspose)    (800, 256, 28, 28)   262400      mask-conv-l3[0][0]               
__________________________________________________________________________________________________
mask_fcn_logits (Conv2D)        (800, 2, 28, 28)     514         conv5-mask[0][0]                 
__________________________________________________________________________________________________
mask_postprocess (MaskPostproce (8, 100, 28, 28)     0           mask_fcn_logits[0][0]            
                                                                 gpu_detections[0][2]             
__________________________________________________________________________________________________
mask_sigmoid (Activation)       (8, 100, 28, 28)     0           mask_postprocess[0][0]           
==================================================================================================
Total params: 24,646,107
Trainable params: 4,822,784
Non-trainable params: 19,823,323
__________________________________________________________________________________________________
INFO:tensorflow:Done calling model_fn.
2025-01-09 12:47:07,435 [TAO Toolkit] [INFO] tensorflow 1150: Done calling model_fn.
INFO:tensorflow:Graph was finalized.
2025-01-09 12:47:07,729 [TAO Toolkit] [INFO] tensorflow 240: Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpzvprwraj/model.ckpt-175000
2025-01-09 12:47:07,731 [TAO Toolkit] [INFO] tensorflow 1284: Restoring parameters from /tmp/tmpzvprwraj/model.ckpt-175000
INFO:tensorflow:Running local_init_op.
2025-01-09 12:47:08,006 [TAO Toolkit] [INFO] tensorflow 500: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2025-01-09 12:47:08,034 [TAO Toolkit] [INFO] tensorflow 502: Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 175000 into /tmp/tmp2p82l2u2/model.ckpt.
2025-01-09 12:47:08,895 [TAO Toolkit] [INFO] tensorflow 606: Saving checkpoints for 175000 into /tmp/tmp2p82l2u2/model.ckpt.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/mask_rcnn/export/exporter.py:244: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2025-01-09 12:47:19,956 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/mask_rcnn/export/exporter.py:244: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

INFO:tensorflow:Restoring parameters from /tmp/tmp2p82l2u2/model.ckpt-175000
2025-01-09 12:47:20,163 [TAO Toolkit] [INFO] tensorflow 1284: Restoring parameters from /tmp/tmp2p82l2u2/model.ckpt-175000
INFO:tensorflow:Froze 107 variables.
2025-01-09 12:47:20,457 [TAO Toolkit] [INFO] tensorflow 334: Froze 107 variables.
INFO:tensorflow:Converted 107 variables to const ops.
2025-01-09 12:47:20,540 [TAO Toolkit] [INFO] tensorflow 394: Converted 107 variables to const ops.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/mask_rcnn/export/exporter.py:287: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

2025-01-09 12:47:20,732 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/mask_rcnn/export/exporter.py:287: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

2025-01-09 12:47:20,733 [TAO Toolkit] [INFO] numba.cuda.cudadrv.driver 266: init
NOTE: UFF has been tested with TensorFlow 1.15.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
Warning: No conversion function registered for layer: MultilevelCropAndResize_TRT yet.
Converting pyramid_crop_and_resize_mask as custom op: MultilevelCropAndResize_TRT
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/uff/converters/tensorflow/converter.py:226: The name tf.AttrValue is deprecated. Please use tf.compat.v1.AttrValue instead.

2025-01-09 12:47:20,995 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/uff/converters/tensorflow/converter.py:226: The name tf.AttrValue is deprecated. Please use tf.compat.v1.AttrValue instead.

Warning: No conversion function registered for layer: ResizeNearest_TRT yet.
Converting nearest_upsampling_2 as custom op: ResizeNearest_TRT
Warning: No conversion function registered for layer: ResizeNearest_TRT yet.
Converting nearest_upsampling_1 as custom op: ResizeNearest_TRT
Warning: No conversion function registered for layer: ResizeNearest_TRT yet.
Converting nearest_upsampling as custom op: ResizeNearest_TRT
Warning: No conversion function registered for layer: SpecialSlice_TRT yet.
Converting mrcnn_detection_bboxes as custom op: SpecialSlice_TRT
Warning: No conversion function registered for layer: GenerateDetection_TRT yet.
Converting generate_detections as custom op: GenerateDetection_TRT
Warning: No conversion function registered for layer: MultilevelProposeROI_TRT yet.
Converting multilevel_propose_rois as custom op: MultilevelProposeROI_TRT
Warning: No conversion function registered for layer: MultilevelCropAndResize_TRT yet.
Converting pyramid_crop_and_resize_box as custom op: MultilevelCropAndResize_TRT
DEBUG [/usr/local/lib/python3.8/dist-packages/uff/converters/tensorflow/converter.py:143] Marking ['generate_detections', 'mask_fcn_logits/BiasAdd'] as outputs
2025-01-09 12:47:21,217 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.mask_rcnn.export.exporter 300: **Converted model was saved into /workspace/tao-experiments/mask_rcnn/experiment_dir_unpruned/model.epoch-5.uff**
loading annotations into memory...
Done (t=1.15s)
creating index...
index created!
[01/09/2025-12:47:22] [TRT] [I] [MemUsageChange] Init CUDA: CPU +12, GPU +0, now: CPU 654, GPU 848 (MiB)
[01/09/2025-12:47:23] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +546, GPU +118, now: CPU 1254, GPU 966 (MiB)
[01/09/2025-12:47:24] [TRT] [W] The implicit batch dimension mode has been deprecated. Please create the network with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag whenever possible.
[01/09/2025-12:47:24] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +10, now: CPU 1477, GPU 974 (MiB)
[01/09/2025-12:47:24] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 1478, GPU 984 (MiB)
[01/09/2025-12:47:24] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[01/09/2025-12:47:36] [TRT] [I] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.
[01/09/2025-12:49:33] [TRT] [I] [GraphReduction] The approximate region cut reduction algorithm is called.
[01/09/2025-12:49:33] [TRT] [I] Total Activation Memory: 10870064128
[01/09/2025-12:49:33] [TRT] [I] Detected 1 inputs and 2 output network tensors.
[01/09/2025-12:49:33] [TRT] [I] Total Host Persistent Memory: 125232
[01/09/2025-12:49:33] [TRT] [I] Total Device Persistent Memory: 5828608
[01/09/2025-12:49:33] [TRT] [I] Total Scratch Memory: 854353408
[01/09/2025-12:49:33] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 67 MiB, GPU 3069 MiB
[01/09/2025-12:49:33] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 118 steps to complete.
[01/09/2025-12:49:33] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 10.272ms to assign 20 blocks to 118 nodes requiring 2348518400 bytes.
[01/09/2025-12:49:33] [TRT] [I] Total Activation Memory: 2348518400
[01/09/2025-12:49:34] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1894, GPU 1094 (MiB)
[01/09/2025-12:49:34] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 1895, GPU 1104 (MiB)
[01/09/2025-12:49:34] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +19, GPU +100, now: CPU 19, GPU 100 (MiB)
Execution status: PASS
2025-01-09 18:19:44,045 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

Please help.

Thanks.

Morganh · January 10, 2025, 6:39am

Currently, Mask_rcnn does not support export to ONNX file yet. Suggest you to use trtexec to generate tensorrt engine for running inference on GPU.
Please refer to Converting TAO-trained MaskRCNN models to ONNX for CPU inference - #4 by telo.
Also, there is Mask2Former as alternative. Please refer to latest TAO 5.5 user guide or notebook.

Pritam · January 13, 2025, 8:24am

Dear @Morganh ,

We are trying to train Mask2Former_inst model we after 1 epoch training automatically get crashed.

Below it the configuration.

results_dir: /results_inst/
dataset:
  contiguous_id: True
  label_map: /specs/labelmap_inst.json
  train:
    type: 'coco'
    name: "coco_2017_train"
    instance_json: "/data/raw-data/annotations/coco_annotations_train_fixed_largeset.json"
    img_dir: "/data/raw-data/train"
    batch_size: 8
    num_workers: 2
  val:
    type: 'coco'
    name: "coco_2017_val"
    instance_json: "/data/raw-data/annotations/coco_annotations_val_fixed_largeset.json"
    img_dir: "/data/raw-data/val"
    batch_size: 1
    num_workers: 2
  test:
    img_dir: /data/raw-data/val
    batch_size: 1
  augmentation:
    train_min_size: [640]
    train_max_size: 640
    train_crop_size: [640, 640]
    test_min_size: 640
    test_max_size: 640
train:
  precision: 'fp16'
  num_gpus: 1
  checkpoint_interval: 1
  validation_interval: 1
  num_epochs: 50
  optim:
    lr_scheduler: "MultiStep"
    milestones: [44, 48]
    type: "AdamW"
    lr: 0.0001
    weight_decay: 0.05
model:
  object_mask_threshold: 0.1
  overlap_threshold: 0.8
  mode: "instance"
  backbone:
    pretrained_weights: "/workspace/tao-experiments/mask2former/swin_tiny_patch4_window7_224_22k.pth"
    type: "swin"
    swin:
      type: "tiny"
      window_size: 7
      ape: False
      pretrain_img_size: 224
  mask_former:
    num_object_queries: 100
  sem_seg_head:
    norm: "GN"
    num_classes: 80
export:
  input_channel: 3
  input_width: 640
  input_height: 640
  opset_version: 17
  batch_size: -1  # dynamic batch size
  on_cpu: False
gen_trt_engine:
  gpu_id: 0
  input_channel: 3
  input_width: 640
  input_height: 640
  tensorrt:
    data_type: fp16
    workspace_size: 4096
    min_batch_size: 1
    opt_batch_size: 1
    max_batch_size: 1

Training Section:

print("For multi-GPU, set NUM_TRAIN_GPUS based on your machine.")
os.environ["NUM_TRAIN_GPUS"] = "1"
os.environ["HYDRA_FULL_ERROR"] = "1"
!tao model mask2former train -e $SPECS_DIR/spec_inst1.yaml \
           train.num_gpus=$NUM_TRAIN_GPUS \
           results_dir=$RESULTS_DIR

Training logs:

/usr/local/lib/python3.6/pty.py:84: ResourceWarning: Unclosed socket <zmq.Socket(zmq.PUSH) at 0x782256094648>
  pid, fd = os.forkpty()
For multi-GPU, set NUM_TRAIN_GPUS based on your machine.
2025-01-13 12:04:17,530 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2025-01-13 12:04:17,581 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt
2025-01-13 12:04:17,603 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 293: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/smarg/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2025-01-13 12:04:17,603 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
[2025-01-13 06:34:21,081 - TAO Toolkit - matplotlib.font_manager - INFO] generated new fontManager
sys:1: UserWarning: 
'spec_inst1.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/hydra/hydra_runner.py:107: UserWarning: 
'spec_inst1.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
  _run_hydra(
/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Train results will be saved at: /results_inst/train
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/loggers/api_logging.py:236: UserWarning: Log file already exists at /results_inst/train/status.json
  rank_zero_warn(
Seed set to 1234
loading annotations into memory...
Done (t=5.39s)
creating index...
index created!
/usr/local/lib/python3.10/dist-packages/torch/functional.py:512: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3553.)
return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]Loading backbone weights from: /workspace/tao-experiments/mask2former/swin_tiny_patch4_window7_224_22k.pth
The backbone weights were loaded successfuly.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py:652: Checkpoint directory /results_inst/train exists and is not empty.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
  | Name      | Type            | Params
----------------------------------------------
0 | model     | MaskFormerModel | 47.4 M
1 | criterion | SetCriterion    | 0     
----------------------------------------------
47.4 M    Trainable params
0         Non-trainable params
47.4 M    Total params
189.687   Total estimated model params size (MB)

Sanity Checking: |          | 0/? [00:00<?, ?it/s]loading annotations into memory...Done (t=0.88s)
creating index...
index created!

Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:00<00:00,  2.10it/s]/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/mask2former/model/pl_model.py:443: RuntimeWarning: invalid value encountered in divide  iou = total_area_intersect / total_area_union

                                                                           
loading annotations into memory...
Done (t=5.51s)
creating index...
index created!

Epoch 0: 100%|██████████| 6250/6250 [1:25:22<00:00,  1.22it/s, v_num=1, train_loss=6.460, lr=0.0001]
Validation: |          | 0/? [00:00<?, ?it/s]
Validation:   0%|          | 0/7927 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 0/7927 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 1/7927 [00:00<17:36,  7.50it/s]
Validation DataLoader 0:   0%|          | 2/7927 [00:00<16:02,  8.23it/s]
Validation DataLoader 0:   0%|          | 3/7927 [00:00<15:23,  8.58it/s]
Validation DataLoader 0:   0%|          | 4/7927 [00:00<14:18,  9.23it/s]
Validation DataLoader 0:   0%|          | 5/7927 [00:00<13:48,  9.56it/s]
Validation DataLoader 0:   0%|          | 6/7927 [00:00<12:56, 10.20it/s]
Validation DataLoader 0:   0%|          | 7/7927 [00:00<12:38, 10.44it/s]
Validation DataLoader 0:   0%|          | 8/7927 [00:00<12:48, 10.31it/s]
Validation DataLoader 0:   0%|          | 9/7927 [00:00<12:54, 10.22it/s]
Validation DataLoader 0:   0%|          | 10/7927 [00:00<13:02, 10.12it/s]
Validation DataLoader 0:   0%|          | 11/7927 [00:01<12:37, 10.45it/s]
Validation DataLoader 0:   0%|          | 12/7927 [00:01<12:17, 10.73it/s]
Validation DataLoader 0:   0%|          | 13/7927 [00:01<11:59, 10.99it/s]
Validation DataLoader 0:   0%|          | 14/7927 [00:01<11:54, 11.08it/s]
Validation DataLoader 0:   0%|          | 15/7927 [00:01<12:01, 10.97it/s]
Validation DataLoader 0:   0%|          | 16/7927 [00:01<12:10, 10.83it/s]
Validation DataLoader 0:   0%|          | 17/7927 [00:01<12:16, 10.74it/s]
Validation DataLoader 0:   0%|          | 18/7927 [00:01<12:25, 10.61it/s]
Validation DataLoader 0:   0%|          | 19/7927 [00:01<12:30, 10.54it/s]
Validation DataLoader 0:   0%|          | 20/7927 [00:01<12:25, 10.61it/s]
Validation DataLoader 0:   0%|          | 21/7927 [00:01<12:28, 10.56it/s]
Validation DataLoader 0:   0%|          | 22/7927 [00:02<12:24, 10.62it/s]
Validation DataLoader 0:   0%|          | 23/7927 [00:02<12:28, 10.57it/s]
Validation DataLoader 0:   0%|          | 24/7927 [00:02<12:32, 10.50it/s]
Validation DataLoader 0:   0%|          | 25/7927 [00:02<12:36, 10.45it/s]
Validation DataLoader 0:   0%|          | 26/7927 [00:02<12:39, 10.41it/s]
Validation DataLoader 0:   0%|          | 27/7927 [00:02<12:42, 10.36it/s]
Validation DataLoader 0:   0%|          | 28/7927 [00:02<12:44, 10.33it/s]
Validation DataLoader 0:   0%|          | 29/7927 [00:02<12:40, 10.38it/s]
Validation DataLoader 0:   0%|          | 30/7927 [00:02<12:38, 10.41it/s]
Validation DataLoader 0:   0%|          | 31/7927 [00:02<12:41, 10.37it/s]
Validation DataLoader 0:   0%|          | 32/7927 [00:03<12:43, 10.34it/s]
Validation DataLoader 0:   0%|          | 33/7927 [00:03<12:45, 10.31it/s]
Validation DataLoader 0:   0%|          | 34/7927 [00:03<12:47, 10.28it/s]
Validation DataLoader 0:   0%|          | 35/7927 [00:03<12:44, 10.32it/s]
Validation DataLoader 0:   0%|          | 36/7927 [00:03<12:46, 10.30it/s]
Validation DataLoader 0:   0%|          | 37/7927 [00:03<12:43, 10.34it/s]
Validation DataLoader 0:   0%|          | 38/7927 [00:03<12:37, 10.42it/s]
Validation DataLoader 0:   0%|          | 39/7927 [00:03<12:35, 10.45it/s]
Validation DataLoader 0:   1%|          | 40/7927 [00:03<12:38, 10.40it/s]
Validation DataLoader 0:   1%|          | 41/7927 [00:03<12:37, 10.41it/s]
Validation DataLoader 0:   1%|          | 42/7927 [00:04<12:36, 10.42it/s]
Validation DataLoader 0:   1%|          | 43/7927 [00:04<12:38, 10.40it/s]
.
.
.
.
.
.
.

Validation DataLoader 0: 100%|█████████▉| 7914/7927 [12:50<00:01, 10.28it/s]
Validation DataLoader 0: 100%|█████████▉| 7915/7927 [12:50<00:01, 10.28it/s]
Validation DataLoader 0: 100%|█████████▉| 7916/7927 [12:50<00:01, 10.28it/s]
Validation DataLoader 0: 100%|█████████▉| 7917/7927 [12:50<00:00, 10.27it/s]
Validation DataLoader 0: 100%|█████████▉| 7918/7927 [12:50<00:00, 10.27it/s]
Validation DataLoader 0: 100%|█████████▉| 7919/7927 [12:50<00:00, 10.27it/s]
Validation DataLoader 0: 100%|█████████▉| 7920/7927 [12:50<00:00, 10.27it/s]
Validation DataLoader 0: 100%|█████████▉| 7921/7927 [12:50<00:00, 10.27it/s]
Validation DataLoader 0: 100%|█████████▉| 7922/7927 [12:51<00:00, 10.27it/s]
Validation DataLoader 0: 100%|█████████▉| 7923/7927 [12:51<00:00, 10.27it/s]
Validation DataLoader 0: 100%|█████████▉| 7924/7927 [12:51<00:00, 10.27it/s]
Validation DataLoader 0: 100%|█████████▉| 7925/7927 [12:51<00:00, 10.27it/s]
Validation DataLoader 0: 100%|█████████▉| 7926/7927 [12:51<00:00, 10.28it/s]
Validation DataLoader 0: 100%|██████████| 7927/7927 [12:51<00:00, 10.28it/s]/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/mask2former/model/pl_model.py:443: RuntimeWarning: invalid value encountered in divide  iou = total_area_intersect / total_area_union


                                                                            
Epoch 0: 100%|██████████| 6250/6250 [1:38:14<00:00,  1.06it/s, v_num=1, train_loss=6.460, lr=0.0001, val_loss=11.20, mIoU=1.000, all_acc=1.000][2025-01-13 08:15:15,069 - TAO Toolkit - root - INFO] Sending telemetry data.
[2025-01-13 08:15:15,082 - TAO Toolkit - root - INFO] ================> Start Reporting Telemetry <================
[2025-01-13 08:15:15,085 - TAO Toolkit - root - INFO] Sending {'version': '5.5.0', 'action': 'train', 'network': 'mask2former', 'gpu': ['NVIDIA-RTX-A4000'], 'success': False, 'time_lapsed': 6053} to https://api.tao.ngc.nvidia.com.
[2025-01-13 08:15:16,813 - TAO Toolkit - root - INFO] Telemetry sent successfully.
[2025-01-13 08:15:16,814 - TAO Toolkit - root - INFO] ================> End Reporting Telemetry <================
[2025-01-13 08:15:16,814 - TAO Toolkit - root - WARNING] Execution status: FAIL
2025-01-13 13:45:20,751 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

where we are making mistake? Please help.

Thanks.

Morganh · January 13, 2025, 8:34am

Could you please create a new forum topic? Thanks!

Pritam · January 13, 2025, 9:11am

Yes. We have created. Mask2Former_inst model training crashed after 1 epoch

system · February 11, 2025, 6:26am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Mask rcnn fails to generate .onnx model .bydefault it generate .uff model TAO Toolkit	1	30	January 10, 2025
Mask-RCNN export does not create ONNX file TAO Toolkit	5	554	September 27, 2023
TAO MaskRCNN inference output problem TAO Toolkit	36	1091	November 30, 2023
Converting TAO-trained MaskRCNN models to ONNX for CPU inference TAO Toolkit onnx , tao , ai-model-training	10	143	November 4, 2024
MaskRCNN TAO UFF model to .engine for Jetson AGX Orin TAO Toolkit jetson , deepstream	3	39	November 30, 2024
Failed to decode TrafficCamNet from etlt to ONNX TAO Toolkit	9	924	December 6, 2023
converting mask rcnn to tensor rt DeepStream SDK	41	12494	August 10, 2021
Train mask-rcnn failure TAO Toolkit tao	16	1216	November 25, 2021
Convert custom dataset using nvidia tao TAO Toolkit tao	2	457	June 14, 2023
UffParser: Validator error: block_4c_bn_3/cond/Switch: Unsupported operation _Switch TAO Toolkit tensorrt	38	1438	January 11, 2022

TAO-5 Mask-rcnn converting tlt to uff instead of onnx

Related topics