I am trying to train Mask R-CNN using ‘TLT MAskRCNN example use case’ and Jupyter notebook hangs during training!
There is no error message or warning! I’ve downloaded the training and validation instance segmentation tfrecords from CVAT!
I am using this docker container: nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3
I have tested the container using the COCO dataset and it works just fine but switching to the new dataset will freeze the Jupiter notebook during training.
For multi-GPU, change --gpus based on your machine.
2021-04-12 20:36:35.245585: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-04-12 20:36:35.281905: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
[MaskRCNN] INFO : Loading weights from /workspace/server/tlt-experiments/maskrcnn/experiment_dir_unpruned/model.step-0.tlt
[MaskRCNN] INFO : Loading weights from /workspace/server/tlt-experiments/maskrcnn/experiment_dir_unpruned/model.step-0.tlt
[MaskRCNN] INFO : Horovod successfully initialized …
[MaskRCNN] INFO : Create EncryptCheckpointSaverHook.
[MaskRCNN] INFO : =================================
[MaskRCNN] INFO : Start training cycle 01
[MaskRCNN] INFO : =================================
[MaskRCNN] INFO : Using Dataset Sharding with Horovod
2021-04-12 20:36:46.176423: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-04-12 20:36:46.214550: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-04-12 20:36:46.224296: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:0a:00.0
2021-04-12 20:36:46.224364: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-04-12 20:36:46.225829: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-04-12 20:36:46.227101: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-04-12 20:36:46.227439: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-04-12 20:36:46.230000: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-04-12 20:36:46.231154: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-04-12 20:36:46.235485: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-04-12 20:36:46.244740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2021-04-12 20:36:46.255034: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:41:00.0
2021-04-12 20:36:46.255105: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-04-12 20:36:46.256642: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-04-12 20:36:46.258116: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-04-12 20:36:46.258464: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-04-12 20:36:46.260074: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-04-12 20:36:46.261275: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-04-12 20:36:46.264812: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-04-12 20:36:46.267281: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: multilevel_propose_rois/level_2/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: multilevel_propose_rois/level_3/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: multilevel_propose_rois/level_4/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: multilevel_propose_rois/level_5/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: multilevel_propose_rois/level_6/
2021-04-12 20:36:50.084480: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:0a:00.0
2021-04-12 20:36:50.084583: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-04-12 20:36:50.084743: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-04-12 20:36:50.084794: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-04-12 20:36:50.084846: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-04-12 20:36:50.084881: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-04-12 20:36:50.084908: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-04-12 20:36:50.084936: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-04-12 20:36:50.087958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2021-04-12 20:36:50.087998: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-04-12 20:36:50.201681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:41:00.0
2021-04-12 20:36:50.201912: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-04-12 20:36:50.202349: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-04-12 20:36:50.202396: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-04-12 20:36:50.202434: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-04-12 20:36:50.202468: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-04-12 20:36:50.202503: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-04-12 20:36:50.202538: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-04-12 20:36:50.206095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2021-04-12 20:36:50.206146: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-04-12 20:36:50.587207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-12 20:36:50.587283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2021-04-12 20:36:50.587292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2021-04-12 20:36:50.591190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22514 MB memory) → physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:0a:00.0, compute capability: 7.5)
2021-04-12 20:36:50.637736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-12 20:36:50.637790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2021-04-12 20:36:50.637798: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2021-04-12 20:36:50.641388: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 19290 MB memory) → physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:41:00.0, compute capability: 7.5)
Parsing Inputs…
[MaskRCNN] INFO : [Training Compute Statistics] 308.2 GFLOPS/image
2021-04-12 20:36:58.019263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:41:00.0
2021-04-12 20:36:58.019385: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-04-12 20:36:58.019514: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-04-12 20:36:58.019540: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-04-12 20:36:58.019561: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-04-12 20:36:58.019583: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-04-12 20:36:58.019603: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-04-12 20:36:58.019624: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-04-12 20:36:58.020916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2021-04-12 20:36:58.020977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-12 20:36:58.020986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2021-04-12 20:36:58.020991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2021-04-12 20:36:58.022033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 19290 MB memory) → physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:41:00.0, compute capability: 7.5)
2021-04-12 20:37:01.515616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:0a:00.0
2021-04-12 20:37:01.515730: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-04-12 20:37:01.515828: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-04-12 20:37:01.515854: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-04-12 20:37:01.515873: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-04-12 20:37:01.515891: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-04-12 20:37:01.515909: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-04-12 20:37:01.515928: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-04-12 20:37:01.516951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2021-04-12 20:37:01.517002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-12 20:37:01.517010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2021-04-12 20:37:01.517016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2021-04-12 20:37:01.518064: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22514 MB memory) → physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:0a:00.0, compute capability: 7.5)
2021-04-12 20:37:03.052306: W tensorflow/core/framework/dataset.cc:382] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
2021-04-12 20:37:06.632883: W tensorflow/core/framework/dataset.cc:382] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
fatal: Not a git repository (or any parent up to mount point /workspace/server)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: Not a git repository (or any parent up to mount point /workspace/server)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
[MaskRCNN] INFO : ============================ GIT REPOSITORY ============================
[MaskRCNN] INFO : BRANCH NAME:
[MaskRCNN] INFO : %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[MaskRCNN] INFO : ============================ MODEL STATISTICS ===========================
[MaskRCNN] INFO : # Model Weights: 44,023,253
[MaskRCNN] INFO : # Trainable Weights: 43,970,133
[MaskRCNN] INFO : %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[MaskRCNN] INFO : ============================ TRAINABLE VARIABLES ========================
[MaskRCNN] INFO : [#0001] conv1/kernel:0 => (7, 7, 3, 64)
[MaskRCNN] INFO : [#0002] bn_conv1/gamma:0 => (64,)
[MaskRCNN] INFO : [#0003] bn_conv1/beta:0 => (64,)
[MaskRCNN] INFO : [#0004] block_1a_conv_1/kernel:0 => (1, 1, 64, 64)
[MaskRCNN] INFO : [#0005] block_1a_bn_1/gamma:0 => (64,)
[MaskRCNN] INFO : [#0006] block_1a_bn_1/beta:0 => (64,)
[MaskRCNN] INFO : [#0007] block_1a_conv_2/kernel:0 => (3, 3, 64, 64)
[MaskRCNN] INFO : [#0008] block_1a_bn_2/gamma:0 => (64,)
[MaskRCNN] INFO : [#0009] block_1a_bn_2/beta:0 => (64,)
[MaskRCNN] INFO : [#0010] block_1a_conv_3/kernel:0 => (1, 1, 64, 256)
[MaskRCNN] INFO : [#0011] block_1a_bn_3/gamma:0 => (256,)
[MaskRCNN] INFO : [#0012] block_1a_bn_3/beta:0 => (256,)
[MaskRCNN] INFO : [#0013] block_1a_conv_shortcut/kernel:0 => (1, 1, 64, 256)
[MaskRCNN] INFO : [#0014] block_1a_bn_shortcut/gamma:0 => (256,)
[MaskRCNN] INFO : [#0015] block_1a_bn_shortcut/beta:0 => (256,)
[MaskRCNN] INFO : [#0016] block_1b_conv_1/kernel:0 => (1, 1, 256, 64)
[MaskRCNN] INFO : [#0017] block_1b_bn_1/gamma:0 => (64,)
[MaskRCNN] INFO : [#0018] block_1b_bn_1/beta:0 => (64,)
[MaskRCNN] INFO : [#0019] block_1b_conv_2/kernel:0 => (3, 3, 64, 64)
[MaskRCNN] INFO : [#0020] block_1b_bn_2/gamma:0 => (64,)
[MaskRCNN] INFO : [#0021] block_1b_bn_2/beta:0 => (64,)
[MaskRCNN] INFO : [#0022] block_1b_conv_3/kernel:0 => (1, 1, 64, 256)
[MaskRCNN] INFO : [#0023] block_1b_bn_3/gamma:0 => (256,)
[MaskRCNN] INFO : [#0024] block_1b_bn_3/beta:0 => (256,)
[MaskRCNN] INFO : [#0025] block_1c_conv_1/kernel:0 => (1, 1, 256, 64)
[MaskRCNN] INFO : [#0026] block_1c_bn_1/gamma:0 => (64,)
[MaskRCNN] INFO : [#0027] block_1c_bn_1/beta:0 => (64,)
[MaskRCNN] INFO : [#0028] block_1c_conv_2/kernel:0 => (3, 3, 64, 64)
[MaskRCNN] INFO : [#0029] block_1c_bn_2/gamma:0 => (64,)
[MaskRCNN] INFO : [#0030] block_1c_bn_2/beta:0 => (64,)
[MaskRCNN] INFO : [#0031] block_1c_conv_3/kernel:0 => (1, 1, 64, 256)
[MaskRCNN] INFO : [#0032] block_1c_bn_3/gamma:0 => (256,)
[MaskRCNN] INFO : [#0033] block_1c_bn_3/beta:0 => (256,)
[MaskRCNN] INFO : [#0034] block_2a_conv_1/kernel:0 => (1, 1, 256, 128)
[MaskRCNN] INFO : [#0035] block_2a_bn_1/gamma:0 => (128,)
[MaskRCNN] INFO : [#0036] block_2a_bn_1/beta:0 => (128,)
[MaskRCNN] INFO : [#0037] block_2a_conv_2/kernel:0 => (3, 3, 128, 128)
[MaskRCNN] INFO : [#0038] block_2a_bn_2/gamma:0 => (128,)
[MaskRCNN] INFO : [#0039] block_2a_bn_2/beta:0 => (128,)
[MaskRCNN] INFO : [#0040] block_2a_conv_3/kernel:0 => (1, 1, 128, 512)
[MaskRCNN] INFO : [#0041] block_2a_bn_3/gamma:0 => (512,)
[MaskRCNN] INFO : [#0042] block_2a_bn_3/beta:0 => (512,)
[MaskRCNN] INFO : [#0043] block_2a_conv_shortcut/kernel:0 => (1, 1, 256, 512)
[MaskRCNN] INFO : [#0044] block_2a_bn_shortcut/gamma:0 => (512,)
[MaskRCNN] INFO : [#0045] block_2a_bn_shortcut/beta:0 => (512,)
[MaskRCNN] INFO : [#0046] block_2b_conv_1/kernel:0 => (1, 1, 512, 128)
[MaskRCNN] INFO : [#0047] block_2b_bn_1/gamma:0 => (128,)
[MaskRCNN] INFO : [#0048] block_2b_bn_1/beta:0 => (128,)
[MaskRCNN] INFO : [#0049] block_2b_conv_2/kernel:0 => (3, 3, 128, 128)
[MaskRCNN] INFO : [#0050] block_2b_bn_2/gamma:0 => (128,)
[MaskRCNN] INFO : [#0051] block_2b_bn_2/beta:0 => (128,)
[MaskRCNN] INFO : [#0052] block_2b_conv_3/kernel:0 => (1, 1, 128, 512)
[MaskRCNN] INFO : [#0053] block_2b_bn_3/gamma:0 => (512,)
[MaskRCNN] INFO : [#0054] block_2b_bn_3/beta:0 => (512,)
[MaskRCNN] INFO : [#0055] block_2c_conv_1/kernel:0 => (1, 1, 512, 128)
[MaskRCNN] INFO : [#0056] block_2c_bn_1/gamma:0 => (128,)
[MaskRCNN] INFO : [#0057] block_2c_bn_1/beta:0 => (128,)
[MaskRCNN] INFO : [#0058] block_2c_conv_2/kernel:0 => (3, 3, 128, 128)
[MaskRCNN] INFO : [#0059] block_2c_bn_2/gamma:0 => (128,)
[MaskRCNN] INFO : [#0060] block_2c_bn_2/beta:0 => (128,)
[MaskRCNN] INFO : [#0061] block_2c_conv_3/kernel:0 => (1, 1, 128, 512)
[MaskRCNN] INFO : [#0062] block_2c_bn_3/gamma:0 => (512,)
[MaskRCNN] INFO : [#0063] block_2c_bn_3/beta:0 => (512,)
[MaskRCNN] INFO : [#0064] block_2d_conv_1/kernel:0 => (1, 1, 512, 128)
[MaskRCNN] INFO : [#0065] block_2d_bn_1/gamma:0 => (128,)
[MaskRCNN] INFO : [#0066] block_2d_bn_1/beta:0 => (128,)
[MaskRCNN] INFO : [#0067] block_2d_conv_2/kernel:0 => (3, 3, 128, 128)
[MaskRCNN] INFO : [#0068] block_2d_bn_2/gamma:0 => (128,)
[MaskRCNN] INFO : [#0069] block_2d_bn_2/beta:0 => (128,)
[MaskRCNN] INFO : [#0070] block_2d_conv_3/kernel:0 => (1, 1, 128, 512)
[MaskRCNN] INFO : [#0071] block_2d_bn_3/gamma:0 => (512,)
[MaskRCNN] INFO : [#0072] block_2d_bn_3/beta:0 => (512,)
[MaskRCNN] INFO : [#0073] block_3a_conv_1/kernel:0 => (1, 1, 512, 256)
[MaskRCNN] INFO : [#0074] block_3a_bn_1/gamma:0 => (256,)
[MaskRCNN] INFO : [#0075] block_3a_bn_1/beta:0 => (256,)
[MaskRCNN] INFO : [#0076] block_3a_conv_2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0077] block_3a_bn_2/gamma:0 => (256,)
[MaskRCNN] INFO : [#0078] block_3a_bn_2/beta:0 => (256,)
[MaskRCNN] INFO : [#0079] block_3a_conv_3/kernel:0 => (1, 1, 256, 1024)
[MaskRCNN] INFO : [#0080] block_3a_bn_3/gamma:0 => (1024,)
[MaskRCNN] INFO : [#0081] block_3a_bn_3/beta:0 => (1024,)
[MaskRCNN] INFO : [#0082] block_3a_conv_shortcut/kernel:0 => (1, 1, 512, 1024)
[MaskRCNN] INFO : [#0083] block_3a_bn_shortcut/gamma:0 => (1024,)
[MaskRCNN] INFO : [#0084] block_3a_bn_shortcut/beta:0 => (1024,)
[MaskRCNN] INFO : [#0085] block_3b_conv_1/kernel:0 => (1, 1, 1024, 256)
[MaskRCNN] INFO : [#0086] block_3b_bn_1/gamma:0 => (256,)
[MaskRCNN] INFO : [#0087] block_3b_bn_1/beta:0 => (256,)
[MaskRCNN] INFO : [#0088] block_3b_conv_2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0089] block_3b_bn_2/gamma:0 => (256,)
[MaskRCNN] INFO : [#0090] block_3b_bn_2/beta:0 => (256,)
[MaskRCNN] INFO : [#0091] block_3b_conv_3/kernel:0 => (1, 1, 256, 1024)
[MaskRCNN] INFO : [#0092] block_3b_bn_3/gamma:0 => (1024,)
[MaskRCNN] INFO : [#0093] block_3b_bn_3/beta:0 => (1024,)
[MaskRCNN] INFO : [#0094] block_3c_conv_1/kernel:0 => (1, 1, 1024, 256)
[MaskRCNN] INFO : [#0095] block_3c_bn_1/gamma:0 => (256,)
[MaskRCNN] INFO : [#0096] block_3c_bn_1/beta:0 => (256,)
[MaskRCNN] INFO : [#0097] block_3c_conv_2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0098] block_3c_bn_2/gamma:0 => (256,)
[MaskRCNN] INFO : [#0099] block_3c_bn_2/beta:0 => (256,)
[MaskRCNN] INFO : [#0100] block_3c_conv_3/kernel:0 => (1, 1, 256, 1024)
[MaskRCNN] INFO : [#0101] block_3c_bn_3/gamma:0 => (1024,)
[MaskRCNN] INFO : [#0102] block_3c_bn_3/beta:0 => (1024,)
[MaskRCNN] INFO : [#0103] block_3d_conv_1/kernel:0 => (1, 1, 1024, 256)
[MaskRCNN] INFO : [#0104] block_3d_bn_1/gamma:0 => (256,)
[MaskRCNN] INFO : [#0105] block_3d_bn_1/beta:0 => (256,)
[MaskRCNN] INFO : [#0106] block_3d_conv_2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0107] block_3d_bn_2/gamma:0 => (256,)
[MaskRCNN] INFO : [#0108] block_3d_bn_2/beta:0 => (256,)
[MaskRCNN] INFO : [#0109] block_3d_conv_3/kernel:0 => (1, 1, 256, 1024)
[MaskRCNN] INFO : [#0110] block_3d_bn_3/gamma:0 => (1024,)
[MaskRCNN] INFO : [#0111] block_3d_bn_3/beta:0 => (1024,)
[MaskRCNN] INFO : [#0112] block_3e_conv_1/kernel:0 => (1, 1, 1024, 256)
[MaskRCNN] INFO : [#0113] block_3e_bn_1/gamma:0 => (256,)
[MaskRCNN] INFO : [#0114] block_3e_bn_1/beta:0 => (256,)
[MaskRCNN] INFO : [#0115] block_3e_conv_2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0116] block_3e_bn_2/gamma:0 => (256,)
[MaskRCNN] INFO : [#0117] block_3e_bn_2/beta:0 => (256,)
[MaskRCNN] INFO : [#0118] block_3e_conv_3/kernel:0 => (1, 1, 256, 1024)
[MaskRCNN] INFO : [#0119] block_3e_bn_3/gamma:0 => (1024,)
[MaskRCNN] INFO : [#0120] block_3e_bn_3/beta:0 => (1024,)
[MaskRCNN] INFO : [#0121] block_3f_conv_1/kernel:0 => (1, 1, 1024, 256)
[MaskRCNN] INFO : [#0122] block_3f_bn_1/gamma:0 => (256,)
[MaskRCNN] INFO : [#0123] block_3f_bn_1/beta:0 => (256,)
[MaskRCNN] INFO : [#0124] block_3f_conv_2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0125] block_3f_bn_2/gamma:0 => (256,)
[MaskRCNN] INFO : [#0126] block_3f_bn_2/beta:0 => (256,)
[MaskRCNN] INFO : [#0127] block_3f_conv_3/kernel:0 => (1, 1, 256, 1024)
[MaskRCNN] INFO : [#0128] block_3f_bn_3/gamma:0 => (1024,)
[MaskRCNN] INFO : [#0129] block_3f_bn_3/beta:0 => (1024,)
[MaskRCNN] INFO : [#0130] block_4a_conv_1/kernel:0 => (1, 1, 1024, 512)
[MaskRCNN] INFO : [#0131] block_4a_bn_1/gamma:0 => (512,)
[MaskRCNN] INFO : [#0132] block_4a_bn_1/beta:0 => (512,)
[MaskRCNN] INFO : [#0133] block_4a_conv_2/kernel:0 => (3, 3, 512, 512)
[MaskRCNN] INFO : [#0134] block_4a_bn_2/gamma:0 => (512,)
[MaskRCNN] INFO : [#0135] block_4a_bn_2/beta:0 => (512,)
[MaskRCNN] INFO : [#0136] block_4a_conv_3/kernel:0 => (1, 1, 512, 2048)
[MaskRCNN] INFO : [#0137] block_4a_bn_3/gamma:0 => (2048,)
[MaskRCNN] INFO : [#0138] block_4a_bn_3/beta:0 => (2048,)
[MaskRCNN] INFO : [#0139] block_4a_conv_shortcut/kernel:0 => (1, 1, 1024, 2048)
[MaskRCNN] INFO : [#0140] block_4a_bn_shortcut/gamma:0 => (2048,)
[MaskRCNN] INFO : [#0141] block_4a_bn_shortcut/beta:0 => (2048,)
[MaskRCNN] INFO : [#0142] block_4b_conv_1/kernel:0 => (1, 1, 2048, 512)
[MaskRCNN] INFO : [#0143] block_4b_bn_1/gamma:0 => (512,)
[MaskRCNN] INFO : [#0144] block_4b_bn_1/beta:0 => (512,)
[MaskRCNN] INFO : [#0145] block_4b_conv_2/kernel:0 => (3, 3, 512, 512)
[MaskRCNN] INFO : [#0146] block_4b_bn_2/gamma:0 => (512,)
[MaskRCNN] INFO : [#0147] block_4b_bn_2/beta:0 => (512,)
[MaskRCNN] INFO : [#0148] block_4b_conv_3/kernel:0 => (1, 1, 512, 2048)
[MaskRCNN] INFO : [#0149] block_4b_bn_3/gamma:0 => (2048,)
[MaskRCNN] INFO : [#0150] block_4b_bn_3/beta:0 => (2048,)
[MaskRCNN] INFO : [#0151] block_4c_conv_1/kernel:0 => (1, 1, 2048, 512)
[MaskRCNN] INFO : [#0152] block_4c_bn_1/gamma:0 => (512,)
[MaskRCNN] INFO : [#0153] block_4c_bn_1/beta:0 => (512,)
[MaskRCNN] INFO : [#0154] block_4c_conv_2/kernel:0 => (3, 3, 512, 512)
[MaskRCNN] INFO : [#0155] block_4c_bn_2/gamma:0 => (512,)
[MaskRCNN] INFO : [#0156] block_4c_bn_2/beta:0 => (512,)
[MaskRCNN] INFO : [#0157] block_4c_conv_3/kernel:0 => (1, 1, 512, 2048)
[MaskRCNN] INFO : [#0158] block_4c_bn_3/gamma:0 => (2048,)
[MaskRCNN] INFO : [#0159] block_4c_bn_3/beta:0 => (2048,)
[MaskRCNN] INFO : [#0160] fpn/l2/kernel:0 => (1, 1, 256, 256)
[MaskRCNN] INFO : [#0161] fpn/l2/bias:0 => (256,)
[MaskRCNN] INFO : [#0162] fpn/l3/kernel:0 => (1, 1, 512, 256)
[MaskRCNN] INFO : [#0163] fpn/l3/bias:0 => (256,)
[MaskRCNN] INFO : [#0164] fpn/l4/kernel:0 => (1, 1, 1024, 256)
[MaskRCNN] INFO : [#0165] fpn/l4/bias:0 => (256,)
[MaskRCNN] INFO : [#0166] fpn/l5/kernel:0 => (1, 1, 2048, 256)
[MaskRCNN] INFO : [#0167] fpn/l5/bias:0 => (256,)
[MaskRCNN] INFO : [#0168] fpn/post_hoc_d2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0169] fpn/post_hoc_d2/bias:0 => (256,)
[MaskRCNN] INFO : [#0170] fpn/post_hoc_d3/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0171] fpn/post_hoc_d3/bias:0 => (256,)
[MaskRCNN] INFO : [#0172] fpn/post_hoc_d4/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0173] fpn/post_hoc_d4/bias:0 => (256,)
[MaskRCNN] INFO : [#0174] fpn/post_hoc_d5/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0175] fpn/post_hoc_d5/bias:0 => (256,)
[MaskRCNN] INFO : [#0176] rpn_head/rpn/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0177] rpn_head/rpn/bias:0 => (256,)
[MaskRCNN] INFO : [#0178] rpn_head/rpn-class/kernel:0 => (1, 1, 256, 3)
[MaskRCNN] INFO : [#0179] rpn_head/rpn-class/bias:0 => (3,)
[MaskRCNN] INFO : [#0180] rpn_head/rpn-box/kernel:0 => (1, 1, 256, 12)
[MaskRCNN] INFO : [#0181] rpn_head/rpn-box/bias:0 => (12,)
[MaskRCNN] INFO : [#0182] box_head/fc6/kernel:0 => (12544, 1024)
[MaskRCNN] INFO : [#0183] box_head/fc6/bias:0 => (1024,)
[MaskRCNN] INFO : [#0184] box_head/fc7/kernel:0 => (1024, 1024)
[MaskRCNN] INFO : [#0185] box_head/fc7/bias:0 => (1024,)
[MaskRCNN] INFO : [#0186] box_head/class-predict/kernel:0 => (1024, 1)
[MaskRCNN] INFO : [#0187] box_head/class-predict/bias:0 => (1,)
[MaskRCNN] INFO : [#0188] box_head/box-predict/kernel:0 => (1024, 4)
[MaskRCNN] INFO : [#0189] box_head/box-predict/bias:0 => (4,)
[MaskRCNN] INFO : [#0190] mask_head/mask-conv-l0/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0191] mask_head/mask-conv-l0/bias:0 => (256,)
[MaskRCNN] INFO : [#0192] mask_head/mask-conv-l1/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0193] mask_head/mask-conv-l1/bias:0 => (256,)
[MaskRCNN] INFO : [#0194] mask_head/mask-conv-l2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0195] mask_head/mask-conv-l2/bias:0 => (256,)
[MaskRCNN] INFO : [#0196] mask_head/mask-conv-l3/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0197] mask_head/mask-conv-l3/bias:0 => (256,)
[MaskRCNN] INFO : [#0198] mask_head/conv5-mask/kernel:0 => (2, 2, 256, 256)
[MaskRCNN] INFO : [#0199] mask_head/conv5-mask/bias:0 => (256,)
[MaskRCNN] INFO : [#0200] mask_head/mask_fcn_logits/kernel:0 => (1, 1, 256, 1)
[MaskRCNN] INFO : [#0201] mask_head/mask_fcn_logits/bias:0 => (1,)
[MaskRCNN] INFO : %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[MaskRCNN] INFO : # ============================================= #
[MaskRCNN] INFO : Start Training
[MaskRCNN] INFO : # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% #
[GPU 00] Restoring pretrained weights (307 Tensors) from: /tmp/tmp3zazdm1d/model.ckpt-0
[MaskRCNN] INFO : Pretrained weights loaded with success…
2021-04-12 20:37:14.778236: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
[MaskRCNN] INFO : Saving checkpoints for 0 into /workspace/server/tlt-experiments/maskrcnn/experiment_dir_unpruned/model.step-0.tlt.
2021-04-12 20:37:29.225699: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-04-12 20:37:29.779242: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
print(“To resume training from a checkpoint, simply run the same training script. It will pick up from where it’s left.”)
!tlt-train mask_rcnn -e $SPECS_DIR/maskrcnn_train_resnet50.txt