Mask R-CNN Training Jupyter Notebook Model Quality and Multiple GPU changes

Working through the Mask R-CNN notebook example using two 1080 GPUs on Ubuntu. Reduced batch size to 1.

At the end of this post is the error message I get during training that the job completed normally but a non-zero exit code so job was aborted. My concern is the resulting trained model isn’t good.

Went ahead and ran step 4 for evaluate trained methods which is written to test against model.step-25000.tlt as a paramter. Overall the training seems to be faster than I would expect so not sure if I am getting to step-25000.tlt because I reran the training multiple times adjusting image size / 2 or batch size = 1 to see if that would help.

In looking at the masks of the images from section 4 evaluated trained methods does a reasonably good job on people but far from perfect, loves zebras, finds bowls and cups where none should be and cabinets/picture fames are considered TVs. Not a good model.

If the config file for the job is expecting 8 GPUs(notebook is training on 2 gpus per parameters but references 4 for changing learning rate) should the number of total steps which is 25,000 by default for say 8 GPUs be increased to 100,000 when using 2 GPUs

[MaskRCNN] INFO : RPN score loss: 0.00433
DLL 2020-12-21 03:17:12.498561 - Iteration: 13760 RPN score loss : 0.00433
[MaskRCNN] INFO : RPN total loss: 0.01772
DLL 2020-12-21 03:17:12.498695 - Iteration: 13760 RPN total loss : 0.01772
[MaskRCNN] INFO : Total loss: 2.55588
DLL 2020-12-21 03:17:12.498814 - Iteration: 13760 Total loss : 2.55588

/usr/local/bin/tlt-train: line 32: 3898 Killed tlt-train-g1 ${PYTHON_ARGS[*]}

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun.real detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[4049,1],0]
Exit code: 137

For mulitple GPUs haven’t come across any DeepStream discussion that the model will be split across multiple GPUs or if an image is run on each GPU during training and the gradients aggregated as a batch.

If you run with 2gpus, please refer to the spec file in Poor metric results after retraining maskrcnn using TLT notebook

Working with the modified paramters for 2 GPUS and the increase to 720,000 which brings another problem into focus. Getting an error message after X number of iterations it exits out. In this example it was approximatly 12,000 iterations with train_batch_size: 1. If I try and run with train_batch_size 2 get immediate out of memory error.

Tail the log file in the unpruned directory nothing unusual.

Any guidance would be appreciated as it will take lots of restarts to get 720,000 iterations where train_batch_size = 1

Tue Dec 22 14:08:14 2020
±----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:01:00.0 On | N/A |
| 28% 35C P8 6W / 180W | 570MiB / 8116MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX 1080 Off | 00000000:03:00.0 Off | N/A |
| 27% 35C P8 5W / 180W | 7MiB / 8119MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       879      G   /usr/lib/xorg/Xorg                344MiB |
|    0   N/A  N/A      1078      G   /usr/bin/gnome-shell              222MiB |
|    1   N/A  N/A       879      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

To resume training from a checkpoint, simply run the same training script. It will pick up from where it’s left.
2020-12-22 16:02:18.347718: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-22 16:02:18.347706: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
[MaskRCNN] INFO : Loading weights from /workspace/tlt-experiments/maskrcnn/exp1/experiment_dir_unpruned/model.step-60000.tlt
[MaskRCNN] INFO : Loading weights from /workspace/tlt-experiments/maskrcnn/exp1/experiment_dir_unpruned/model.step-60000.tlt
[MaskRCNN] INFO : Horovod successfully initialized …
[MaskRCNN] INFO : Create EncryptCheckpointSaverHook.

[MaskRCNN] INFO : =================================
[MaskRCNN] INFO : Start training cycle 02
[MaskRCNN] INFO : =================================

[MaskRCNN] INFO : Using Dataset Sharding with Horovod
2020-12-22 16:02:41.815815: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-12-22 16:02:41.815822: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-12-22 16:02:42.260555: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:42.260560: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:42.262235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:03:00.0
2020-12-22 16:02:42.262280: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-22 16:02:42.262341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
2020-12-22 16:02:42.262384: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-22 16:02:43.219502: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-22 16:02:43.219500: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-22 16:02:43.244121: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-22 16:02:43.244114: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-22 16:02:43.251738: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-22 16:02:43.251766: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-22 16:02:43.314964: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-22 16:02:43.314953: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-22 16:02:43.351668: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-22 16:02:43.351722: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-22 16:02:43.460949: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-22 16:02:43.460917: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-22 16:02:43.461392: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:43.461392: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:43.465569: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:43.465691: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:43.469044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-12-22 16:02:43.469088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: multilevel_propose_rois/level_2/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: multilevel_propose_rois/level_3/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: multilevel_propose_rois/level_4/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: multilevel_propose_rois/level_5/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: multilevel_propose_rois/level_6/
2020-12-22 16:02:47.015848: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:47.016443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
2020-12-22 16:02:47.016473: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-22 16:02:47.016519: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-22 16:02:47.016538: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-22 16:02:47.016554: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-22 16:02:47.016571: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-22 16:02:47.016587: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-22 16:02:47.016604: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-22 16:02:47.016663: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:47.017231: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:47.017684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-12-22 16:02:47.045055: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:47.045655: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:03:00.0
2020-12-22 16:02:47.045698: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-22 16:02:47.045741: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-22 16:02:47.045760: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-22 16:02:47.045776: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-22 16:02:47.045792: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-22 16:02:47.045809: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-22 16:02:47.045826: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-22 16:02:47.045885: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:47.046482: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:47.046938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-12-22 16:02:47.047234: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-22 16:02:47.047216: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-22 16:02:49.255618: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-22 16:02:49.255657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-12-22 16:02:49.255664: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-12-22 16:02:49.255853: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:49.256314: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:49.256675: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:49.256918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-22 16:02:49.256936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-12-22 16:02:49.256946: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-12-22 16:02:49.257069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6561 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-12-22 16:02:49.257107: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:49.257497: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:49.257848: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:49.258217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7127 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1)
Parsing Inputs…
[MaskRCNN] INFO : [Training Compute Statistics] 547.2 GFLOPS/image
2020-12-22 16:02:56.309606: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:56.309878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:03:00.0
2020-12-22 16:02:56.309910: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-22 16:02:56.309976: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-22 16:02:56.310002: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-22 16:02:56.310023: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-22 16:02:56.310043: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-22 16:02:56.310064: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-22 16:02:56.310084: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-22 16:02:56.310149: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:56.310401: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:56.310610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-12-22 16:02:56.310631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-22 16:02:56.310638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-12-22 16:02:56.310643: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-12-22 16:02:56.310708: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:56.310957: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:56.311173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7127 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1)
2020-12-22 16:02:59.242501: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:59.242819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
2020-12-22 16:02:59.242859: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-22 16:02:59.242906: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-22 16:02:59.242926: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-22 16:02:59.242949: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-22 16:02:59.242966: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-22 16:02:59.242983: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-22 16:02:59.243000: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-22 16:02:59.243059: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:59.243344: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:59.243563: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-12-22 16:02:59.243587: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-22 16:02:59.243594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-12-22 16:02:59.243602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-12-22 16:02:59.243683: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:59.243953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-22 16:02:59.244180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6561 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-12-22 16:03:01.747991: W tensorflow/core/framework/dataset.cc:382] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
2020-12-22 16:03:04.204327: W tensorflow/core/framework/dataset.cc:382] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
fatal: Not a git repository (or any of the parent directories): .git
fatal: Not a git repository (or any of the parent directories): .git
[MaskRCNN] INFO : ============================ GIT REPOSITORY ============================
[MaskRCNN] INFO : BRANCH NAME:
[MaskRCNN] INFO : %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

[MaskRCNN] INFO : ============================ MODEL STATISTICS ===========================
[MaskRCNN] INFO : # Model Weights: 44,507,633
[MaskRCNN] INFO : # Trainable Weights: 44,454,513
[MaskRCNN] INFO : %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

[MaskRCNN] INFO : ============================ TRAINABLE VARIABLES ========================
[MaskRCNN] INFO : [#0001] conv1/kernel:0 => (7, 7, 3, 64)
[MaskRCNN] INFO : [#0002] bn_conv1/gamma:0 => (64,)
[MaskRCNN] INFO : [#0003] bn_conv1/beta:0 => (64,)
[MaskRCNN] INFO : [#0004] block_1a_conv_1/kernel:0 => (1, 1, 64, 64)
[MaskRCNN] INFO : [#0005] block_1a_bn_1/gamma:0 => (64,)
[MaskRCNN] INFO : [#0006] block_1a_bn_1/beta:0 => (64,)
[MaskRCNN] INFO : [#0007] block_1a_conv_2/kernel:0 => (3, 3, 64, 64)
[MaskRCNN] INFO : [#0008] block_1a_bn_2/gamma:0 => (64,)
[MaskRCNN] INFO : [#0009] block_1a_bn_2/beta:0 => (64,)
[MaskRCNN] INFO : [#0010] block_1a_conv_3/kernel:0 => (1, 1, 64, 256)
[MaskRCNN] INFO : [#0011] block_1a_bn_3/gamma:0 => (256,)
[MaskRCNN] INFO : [#0012] block_1a_bn_3/beta:0 => (256,)
[MaskRCNN] INFO : [#0013] block_1a_conv_shortcut/kernel:0 => (1, 1, 64, 256)
[MaskRCNN] INFO : [#0014] block_1a_bn_shortcut/gamma:0 => (256,)
[MaskRCNN] INFO : [#0015] block_1a_bn_shortcut/beta:0 => (256,)
[MaskRCNN] INFO : [#0016] block_1b_conv_1/kernel:0 => (1, 1, 256, 64)
[MaskRCNN] INFO : [#0017] block_1b_bn_1/gamma:0 => (64,)
[MaskRCNN] INFO : [#0018] block_1b_bn_1/beta:0 => (64,)
[MaskRCNN] INFO : [#0019] block_1b_conv_2/kernel:0 => (3, 3, 64, 64)
[MaskRCNN] INFO : [#0020] block_1b_bn_2/gamma:0 => (64,)
[MaskRCNN] INFO : [#0021] block_1b_bn_2/beta:0 => (64,)
[MaskRCNN] INFO : [#0022] block_1b_conv_3/kernel:0 => (1, 1, 64, 256)
[MaskRCNN] INFO : [#0023] block_1b_bn_3/gamma:0 => (256,)
[MaskRCNN] INFO : [#0024] block_1b_bn_3/beta:0 => (256,)
[MaskRCNN] INFO : [#0025] block_1c_conv_1/kernel:0 => (1, 1, 256, 64)
[MaskRCNN] INFO : [#0026] block_1c_bn_1/gamma:0 => (64,)
[MaskRCNN] INFO : [#0027] block_1c_bn_1/beta:0 => (64,)
[MaskRCNN] INFO : [#0028] block_1c_conv_2/kernel:0 => (3, 3, 64, 64)
[MaskRCNN] INFO : [#0029] block_1c_bn_2/gamma:0 => (64,)
[MaskRCNN] INFO : [#0030] block_1c_bn_2/beta:0 => (64,)
[MaskRCNN] INFO : [#0031] block_1c_conv_3/kernel:0 => (1, 1, 64, 256)
[MaskRCNN] INFO : [#0032] block_1c_bn_3/gamma:0 => (256,)
[MaskRCNN] INFO : [#0033] block_1c_bn_3/beta:0 => (256,)
[MaskRCNN] INFO : [#0034] block_2a_conv_1/kernel:0 => (1, 1, 256, 128)
[MaskRCNN] INFO : [#0035] block_2a_bn_1/gamma:0 => (128,)
[MaskRCNN] INFO : [#0036] block_2a_bn_1/beta:0 => (128,)
[MaskRCNN] INFO : [#0037] block_2a_conv_2/kernel:0 => (3, 3, 128, 128)
[MaskRCNN] INFO : [#0038] block_2a_bn_2/gamma:0 => (128,)
[MaskRCNN] INFO : [#0039] block_2a_bn_2/beta:0 => (128,)
[MaskRCNN] INFO : [#0040] block_2a_conv_3/kernel:0 => (1, 1, 128, 512)
[MaskRCNN] INFO : [#0041] block_2a_bn_3/gamma:0 => (512,)
[MaskRCNN] INFO : [#0042] block_2a_bn_3/beta:0 => (512,)
[MaskRCNN] INFO : [#0043] block_2a_conv_shortcut/kernel:0 => (1, 1, 256, 512)
[MaskRCNN] INFO : [#0044] block_2a_bn_shortcut/gamma:0 => (512,)
[MaskRCNN] INFO : [#0045] block_2a_bn_shortcut/beta:0 => (512,)
[MaskRCNN] INFO : [#0046] block_2b_conv_1/kernel:0 => (1, 1, 512, 128)
[MaskRCNN] INFO : [#0047] block_2b_bn_1/gamma:0 => (128,)
[MaskRCNN] INFO : [#0048] block_2b_bn_1/beta:0 => (128,)
[MaskRCNN] INFO : [#0049] block_2b_conv_2/kernel:0 => (3, 3, 128, 128)
[MaskRCNN] INFO : [#0050] block_2b_bn_2/gamma:0 => (128,)
[MaskRCNN] INFO : [#0051] block_2b_bn_2/beta:0 => (128,)
[MaskRCNN] INFO : [#0052] block_2b_conv_3/kernel:0 => (1, 1, 128, 512)
[MaskRCNN] INFO : [#0053] block_2b_bn_3/gamma:0 => (512,)
[MaskRCNN] INFO : [#0054] block_2b_bn_3/beta:0 => (512,)
[MaskRCNN] INFO : [#0055] block_2c_conv_1/kernel:0 => (1, 1, 512, 128)
[MaskRCNN] INFO : [#0056] block_2c_bn_1/gamma:0 => (128,)
[MaskRCNN] INFO : [#0057] block_2c_bn_1/beta:0 => (128,)
[MaskRCNN] INFO : [#0058] block_2c_conv_2/kernel:0 => (3, 3, 128, 128)
[MaskRCNN] INFO : [#0059] block_2c_bn_2/gamma:0 => (128,)
[MaskRCNN] INFO : [#0060] block_2c_bn_2/beta:0 => (128,)
[MaskRCNN] INFO : [#0061] block_2c_conv_3/kernel:0 => (1, 1, 128, 512)
[MaskRCNN] INFO : [#0062] block_2c_bn_3/gamma:0 => (512,)
[MaskRCNN] INFO : [#0063] block_2c_bn_3/beta:0 => (512,)
[MaskRCNN] INFO : [#0064] block_2d_conv_1/kernel:0 => (1, 1, 512, 128)
[MaskRCNN] INFO : [#0065] block_2d_bn_1/gamma:0 => (128,)
[MaskRCNN] INFO : [#0066] block_2d_bn_1/beta:0 => (128,)
[MaskRCNN] INFO : [#0067] block_2d_conv_2/kernel:0 => (3, 3, 128, 128)
[MaskRCNN] INFO : [#0068] block_2d_bn_2/gamma:0 => (128,)
[MaskRCNN] INFO : [#0069] block_2d_bn_2/beta:0 => (128,)
[MaskRCNN] INFO : [#0070] block_2d_conv_3/kernel:0 => (1, 1, 128, 512)
[MaskRCNN] INFO : [#0071] block_2d_bn_3/gamma:0 => (512,)
[MaskRCNN] INFO : [#0072] block_2d_bn_3/beta:0 => (512,)
[MaskRCNN] INFO : [#0073] block_3a_conv_1/kernel:0 => (1, 1, 512, 256)
[MaskRCNN] INFO : [#0074] block_3a_bn_1/gamma:0 => (256,)
[MaskRCNN] INFO : [#0075] block_3a_bn_1/beta:0 => (256,)
[MaskRCNN] INFO : [#0076] block_3a_conv_2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0077] block_3a_bn_2/gamma:0 => (256,)
[MaskRCNN] INFO : [#0078] block_3a_bn_2/beta:0 => (256,)
[MaskRCNN] INFO : [#0079] block_3a_conv_3/kernel:0 => (1, 1, 256, 1024)
[MaskRCNN] INFO : [#0080] block_3a_bn_3/gamma:0 => (1024,)
[MaskRCNN] INFO : [#0081] block_3a_bn_3/beta:0 => (1024,)
[MaskRCNN] INFO : [#0082] block_3a_conv_shortcut/kernel:0 => (1, 1, 512, 1024)
[MaskRCNN] INFO : [#0083] block_3a_bn_shortcut/gamma:0 => (1024,)
[MaskRCNN] INFO : [#0084] block_3a_bn_shortcut/beta:0 => (1024,)
[MaskRCNN] INFO : [#0085] block_3b_conv_1/kernel:0 => (1, 1, 1024, 256)
[MaskRCNN] INFO : [#0086] block_3b_bn_1/gamma:0 => (256,)
[MaskRCNN] INFO : [#0087] block_3b_bn_1/beta:0 => (256,)
[MaskRCNN] INFO : [#0088] block_3b_conv_2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0089] block_3b_bn_2/gamma:0 => (256,)
[MaskRCNN] INFO : [#0090] block_3b_bn_2/beta:0 => (256,)
[MaskRCNN] INFO : [#0091] block_3b_conv_3/kernel:0 => (1, 1, 256, 1024)
[MaskRCNN] INFO : [#0092] block_3b_bn_3/gamma:0 => (1024,)
[MaskRCNN] INFO : [#0093] block_3b_bn_3/beta:0 => (1024,)
[MaskRCNN] INFO : [#0094] block_3c_conv_1/kernel:0 => (1, 1, 1024, 256)
[MaskRCNN] INFO : [#0095] block_3c_bn_1/gamma:0 => (256,)
[MaskRCNN] INFO : [#0096] block_3c_bn_1/beta:0 => (256,)
[MaskRCNN] INFO : [#0097] block_3c_conv_2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0098] block_3c_bn_2/gamma:0 => (256,)
[MaskRCNN] INFO : [#0099] block_3c_bn_2/beta:0 => (256,)
[MaskRCNN] INFO : [#0100] block_3c_conv_3/kernel:0 => (1, 1, 256, 1024)
[MaskRCNN] INFO : [#0101] block_3c_bn_3/gamma:0 => (1024,)
[MaskRCNN] INFO : [#0102] block_3c_bn_3/beta:0 => (1024,)
[MaskRCNN] INFO : [#0103] block_3d_conv_1/kernel:0 => (1, 1, 1024, 256)
[MaskRCNN] INFO : [#0104] block_3d_bn_1/gamma:0 => (256,)
[MaskRCNN] INFO : [#0105] block_3d_bn_1/beta:0 => (256,)
[MaskRCNN] INFO : [#0106] block_3d_conv_2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0107] block_3d_bn_2/gamma:0 => (256,)
[MaskRCNN] INFO : [#0108] block_3d_bn_2/beta:0 => (256,)
[MaskRCNN] INFO : [#0109] block_3d_conv_3/kernel:0 => (1, 1, 256, 1024)
[MaskRCNN] INFO : [#0110] block_3d_bn_3/gamma:0 => (1024,)
[MaskRCNN] INFO : [#0111] block_3d_bn_3/beta:0 => (1024,)
[MaskRCNN] INFO : [#0112] block_3e_conv_1/kernel:0 => (1, 1, 1024, 256)
[MaskRCNN] INFO : [#0113] block_3e_bn_1/gamma:0 => (256,)
[MaskRCNN] INFO : [#0114] block_3e_bn_1/beta:0 => (256,)
[MaskRCNN] INFO : [#0115] block_3e_conv_2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0116] block_3e_bn_2/gamma:0 => (256,)
[MaskRCNN] INFO : [#0117] block_3e_bn_2/beta:0 => (256,)
[MaskRCNN] INFO : [#0118] block_3e_conv_3/kernel:0 => (1, 1, 256, 1024)
[MaskRCNN] INFO : [#0119] block_3e_bn_3/gamma:0 => (1024,)
[MaskRCNN] INFO : [#0120] block_3e_bn_3/beta:0 => (1024,)
[MaskRCNN] INFO : [#0121] block_3f_conv_1/kernel:0 => (1, 1, 1024, 256)
[MaskRCNN] INFO : [#0122] block_3f_bn_1/gamma:0 => (256,)
[MaskRCNN] INFO : [#0123] block_3f_bn_1/beta:0 => (256,)
[MaskRCNN] INFO : [#0124] block_3f_conv_2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0125] block_3f_bn_2/gamma:0 => (256,)
[MaskRCNN] INFO : [#0126] block_3f_bn_2/beta:0 => (256,)
[MaskRCNN] INFO : [#0127] block_3f_conv_3/kernel:0 => (1, 1, 256, 1024)
[MaskRCNN] INFO : [#0128] block_3f_bn_3/gamma:0 => (1024,)
[MaskRCNN] INFO : [#0129] block_3f_bn_3/beta:0 => (1024,)
[MaskRCNN] INFO : [#0130] block_4a_conv_1/kernel:0 => (1, 1, 1024, 512)
[MaskRCNN] INFO : [#0131] block_4a_bn_1/gamma:0 => (512,)
[MaskRCNN] INFO : [#0132] block_4a_bn_1/beta:0 => (512,)
[MaskRCNN] INFO : [#0133] block_4a_conv_2/kernel:0 => (3, 3, 512, 512)
[MaskRCNN] INFO : [#0134] block_4a_bn_2/gamma:0 => (512,)
[MaskRCNN] INFO : [#0135] block_4a_bn_2/beta:0 => (512,)
[MaskRCNN] INFO : [#0136] block_4a_conv_3/kernel:0 => (1, 1, 512, 2048)
[MaskRCNN] INFO : [#0137] block_4a_bn_3/gamma:0 => (2048,)
[MaskRCNN] INFO : [#0138] block_4a_bn_3/beta:0 => (2048,)
[MaskRCNN] INFO : [#0139] block_4a_conv_shortcut/kernel:0 => (1, 1, 1024, 2048)
[MaskRCNN] INFO : [#0140] block_4a_bn_shortcut/gamma:0 => (2048,)
[MaskRCNN] INFO : [#0141] block_4a_bn_shortcut/beta:0 => (2048,)
[MaskRCNN] INFO : [#0142] block_4b_conv_1/kernel:0 => (1, 1, 2048, 512)
[MaskRCNN] INFO : [#0143] block_4b_bn_1/gamma:0 => (512,)
[MaskRCNN] INFO : [#0144] block_4b_bn_1/beta:0 => (512,)
[MaskRCNN] INFO : [#0145] block_4b_conv_2/kernel:0 => (3, 3, 512, 512)
[MaskRCNN] INFO : [#0146] block_4b_bn_2/gamma:0 => (512,)
[MaskRCNN] INFO : [#0147] block_4b_bn_2/beta:0 => (512,)
[MaskRCNN] INFO : [#0148] block_4b_conv_3/kernel:0 => (1, 1, 512, 2048)
[MaskRCNN] INFO : [#0149] block_4b_bn_3/gamma:0 => (2048,)
[MaskRCNN] INFO : [#0150] block_4b_bn_3/beta:0 => (2048,)
[MaskRCNN] INFO : [#0151] block_4c_conv_1/kernel:0 => (1, 1, 2048, 512)
[MaskRCNN] INFO : [#0152] block_4c_bn_1/gamma:0 => (512,)
[MaskRCNN] INFO : [#0153] block_4c_bn_1/beta:0 => (512,)
[MaskRCNN] INFO : [#0154] block_4c_conv_2/kernel:0 => (3, 3, 512, 512)
[MaskRCNN] INFO : [#0155] block_4c_bn_2/gamma:0 => (512,)
[MaskRCNN] INFO : [#0156] block_4c_bn_2/beta:0 => (512,)
[MaskRCNN] INFO : [#0157] block_4c_conv_3/kernel:0 => (1, 1, 512, 2048)
[MaskRCNN] INFO : [#0158] block_4c_bn_3/gamma:0 => (2048,)
[MaskRCNN] INFO : [#0159] block_4c_bn_3/beta:0 => (2048,)
[MaskRCNN] INFO : [#0160] fpn/l2/kernel:0 => (1, 1, 256, 256)
[MaskRCNN] INFO : [#0161] fpn/l2/bias:0 => (256,)
[MaskRCNN] INFO : [#0162] fpn/l3/kernel:0 => (1, 1, 512, 256)
[MaskRCNN] INFO : [#0163] fpn/l3/bias:0 => (256,)
[MaskRCNN] INFO : [#0164] fpn/l4/kernel:0 => (1, 1, 1024, 256)
[MaskRCNN] INFO : [#0165] fpn/l4/bias:0 => (256,)
[MaskRCNN] INFO : [#0166] fpn/l5/kernel:0 => (1, 1, 2048, 256)
[MaskRCNN] INFO : [#0167] fpn/l5/bias:0 => (256,)
[MaskRCNN] INFO : [#0168] fpn/post_hoc_d2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0169] fpn/post_hoc_d2/bias:0 => (256,)
[MaskRCNN] INFO : [#0170] fpn/post_hoc_d3/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0171] fpn/post_hoc_d3/bias:0 => (256,)
[MaskRCNN] INFO : [#0172] fpn/post_hoc_d4/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0173] fpn/post_hoc_d4/bias:0 => (256,)
[MaskRCNN] INFO : [#0174] fpn/post_hoc_d5/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0175] fpn/post_hoc_d5/bias:0 => (256,)
[MaskRCNN] INFO : [#0176] rpn_head/rpn/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0177] rpn_head/rpn/bias:0 => (256,)
[MaskRCNN] INFO : [#0178] rpn_head/rpn-class/kernel:0 => (1, 1, 256, 3)
[MaskRCNN] INFO : [#0179] rpn_head/rpn-class/bias:0 => (3,)
[MaskRCNN] INFO : [#0180] rpn_head/rpn-box/kernel:0 => (1, 1, 256, 12)
[MaskRCNN] INFO : [#0181] rpn_head/rpn-box/bias:0 => (12,)
[MaskRCNN] INFO : [#0182] box_head/fc6/kernel:0 => (12544, 1024)
[MaskRCNN] INFO : [#0183] box_head/fc6/bias:0 => (1024,)
[MaskRCNN] INFO : [#0184] box_head/fc7/kernel:0 => (1024, 1024)
[MaskRCNN] INFO : [#0185] box_head/fc7/bias:0 => (1024,)
[MaskRCNN] INFO : [#0186] box_head/class-predict/kernel:0 => (1024, 91)
[MaskRCNN] INFO : [#0187] box_head/class-predict/bias:0 => (91,)
[MaskRCNN] INFO : [#0188] box_head/box-predict/kernel:0 => (1024, 364)
[MaskRCNN] INFO : [#0189] box_head/box-predict/bias:0 => (364,)
[MaskRCNN] INFO : [#0190] mask_head/mask-conv-l0/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0191] mask_head/mask-conv-l0/bias:0 => (256,)
[MaskRCNN] INFO : [#0192] mask_head/mask-conv-l1/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0193] mask_head/mask-conv-l1/bias:0 => (256,)
[MaskRCNN] INFO : [#0194] mask_head/mask-conv-l2/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0195] mask_head/mask-conv-l2/bias:0 => (256,)
[MaskRCNN] INFO : [#0196] mask_head/mask-conv-l3/kernel:0 => (3, 3, 256, 256)
[MaskRCNN] INFO : [#0197] mask_head/mask-conv-l3/bias:0 => (256,)
[MaskRCNN] INFO : [#0198] mask_head/conv5-mask/kernel:0 => (2, 2, 256, 256)
[MaskRCNN] INFO : [#0199] mask_head/conv5-mask/bias:0 => (256,)
[MaskRCNN] INFO : [#0200] mask_head/mask_fcn_logits/kernel:0 => (1, 1, 256, 91)
[MaskRCNN] INFO : [#0201] mask_head/mask_fcn_logits/bias:0 => (91,)
[MaskRCNN] INFO : %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

[MaskRCNN] INFO : # ============================================= #
[MaskRCNN] INFO : Start Training
[MaskRCNN] INFO : # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% #

[GPU 00] Restoring pretrained weights (307 Tensors) from: /tmp/tmpwf8z9gf4/model.ckpt-60000
[MaskRCNN] INFO : Pretrained weights loaded with success…

2020-12-22 16:03:09.270489: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-22 16:03:20.011847: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:145] Filling up shuffle buffer (this may take a while): 117 of 4096
[MaskRCNN] INFO : Saving checkpoints for 60000 into /workspace/tlt-experiments/maskrcnn/exp1/experiment_dir_unpruned/model.step-60000.tlt.
2020-12-22 16:03:30.164571: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:145] Filling up shuffle buffer (this may take a while): 3248 of 4096
2020-12-22 16:03:30.809275: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:195] Shuffle buffer filled.
2020-12-22 16:03:32.451011: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-22 16:03:38.965598: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-22 16:03:49.359992: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:145] Filling up shuffle buffer (this may take a while): 38 of 4096
2020-12-22 16:03:59.424414: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:145] Filling up shuffle buffer (this may take a while): 2812 of 4096
2020-12-22 16:04:01.651668: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:195] Shuffle buffer filled.
2020-12-22 16:04:02.709640: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
[MaskRCNN] INFO : timestamp: 1608653057.8927548
[MaskRCNN] INFO : iteration: 60005
DLL 2020-12-22 16:04:17.915451 - iteration : 60005
[MaskRCNN] INFO : throughput: 1.0 samples/sec
DLL 2020-12-22 16:04:17.915634 - Iteration: 60005 throughput : 0.9736721380764456
[MaskRCNN] INFO : ==================== Metrics =====================
[MaskRCNN] INFO : FastRCNN box loss: 0.24076
DLL 2020-12-22 16:04:17.916156 - Iteration: 60005 FastRCNN box loss : 0.24076
[MaskRCNN] INFO : FastRCNN class loss: 0.07063
DLL 2020-12-22 16:04:17.916258 - Iteration: 60005 FastRCNN class loss : 0.07063
[MaskRCNN] INFO : FastRCNN total loss: 0.31138
DLL 2020-12-22 16:04:17.916354 - Iteration: 60005 FastRCNN total loss : 0.31138
[MaskRCNN] INFO : L2 loss: 0.36905
DLL 2020-12-22 16:04:17.916449 - Iteration: 60005 L2 loss : 0.36905
[MaskRCNN] INFO : Learning rate: 0.005
DLL 2020-12-22 16:04:17.916545 - Iteration: 60005 Learning rate : 0.005
[MaskRCNN] INFO : Mask loss: 0.37652
DLL 2020-12-22 16:04:17.916638 - Iteration: 60005 Mask loss : 0.37652
[MaskRCNN] INFO : RPN box loss: 0.0137
DLL 2020-12-22 16:04:17.916729 - Iteration: 60005 RPN box loss : 0.0137
[MaskRCNN] INFO : RPN score loss: 0.01402
DLL 2020-12-22 16:04:17.916819 - Iteration: 60005 RPN score loss : 0.01402
[MaskRCNN] INFO : RPN total loss: 0.02772
DLL 2020-12-22 16:04:17.916908 - Iteration: 60005 RPN total loss : 0.02772
[MaskRCNN] INFO : Total loss: 1.08468
DLL 2020-12-22 16:04:17.916998 - Iteration: 60005 Total loss : 1.08468

[MaskRCNN] INFO : timestamp: 1608653060.737262
[MaskRCNN] INFO : iteration: 60010
DLL 2020-12-22 16:04:20.737456 - iteration : 60010
[MaskRCNN] INFO : throughput: 2.3 samples/sec
DLL 2020-12-22 16:04:20.737564 - Iteration: 60010 throughput : 2.3231455686435747
[MaskRCNN] INFO : ==================== Metrics =====================
[MaskRCNN] INFO : FastRCNN box loss: 0.51139
DLL 2020-12-22 16:04:20.737988 - Iteration: 60010 FastRCNN box loss : 0.51139
[MaskRCNN] INFO : FastRCNN class loss: 0.34193
DLL 2020-12-22 16:04:20.738092 - Iteration: 60010 FastRCNN class loss : 0.34193
[MaskRCNN] INFO : FastRCNN total loss: 0.85332
DLL 2020-12-22 16:04:20.738192 - Iteration: 60010 FastRCNN total loss : 0.85332
[MaskRCNN] INFO : L2 loss: 0.36905
DLL 2020-12-22 16:04:20.738287 - Iteration: 60010 L2 loss : 0.36905
[MaskRCNN] INFO : Learning rate: 0.005
DLL 2020-12-22 16:04:20.738436 - Iteration: 60010 Learning rate : 0.005
[MaskRCNN] INFO : Mask loss: 0.52758
DLL 2020-12-22 16:04:20.738553 - Iteration: 60010 Mask loss : 0.52758
[MaskRCNN] INFO : RPN box loss: 0.05652
DLL 2020-12-22 16:04:20.738649 - Iteration: 60010 RPN box loss : 0.05652
[MaskRCNN] INFO : RPN score loss: 0.0378
DLL 2020-12-22 16:04:20.738742 - Iteration: 60010 RPN score loss : 0.0378
[MaskRCNN] INFO : RPN total loss: 0.09432
DLL 2020-12-22 16:04:20.738835 - Iteration: 60010 RPN total loss : 0.09432
[MaskRCNN] INFO : Total loss: 1.84427
DLL 2020-12-22 16:04:20.738927 - Iteration: 60010 Total loss : 1.84427

A COUPLE ITERATIONS AT THE END BEFORE THE ERROR

[MaskRCNN] INFO : timestamp: 1608659858.2925525
[MaskRCNN] INFO : iteration: 72940
DLL 2020-12-22 17:57:38.405440 - iteration : 72940
[MaskRCNN] INFO : throughput: 3.9 samples/sec
DLL 2020-12-22 17:57:38.532309 - Iteration: 72940 throughput : 3.872481777341103
[MaskRCNN] INFO : ==================== Metrics =====================
[MaskRCNN] INFO : FastRCNN box loss: 0.34174
DLL 2020-12-22 17:57:38.578164 - Iteration: 72940 FastRCNN box loss : 0.34174
[MaskRCNN] INFO : FastRCNN class loss: 0.7384
DLL 2020-12-22 17:57:38.579095 - Iteration: 72940 FastRCNN class loss : 0.7384
[MaskRCNN] INFO : FastRCNN total loss: 1.08015
DLL 2020-12-22 17:57:38.579924 - Iteration: 72940 FastRCNN total loss : 1.08015
[MaskRCNN] INFO : L2 loss: 0.36534
DLL 2020-12-22 17:57:38.581276 - Iteration: 72940 L2 loss : 0.36534
[MaskRCNN] INFO : Learning rate: 0.005
DLL 2020-12-22 17:57:38.582406 - Iteration: 72940 Learning rate : 0.005
[MaskRCNN] INFO : Mask loss: 0.48719
DLL 2020-12-22 17:57:38.583263 - Iteration: 72940 Mask loss : 0.48719
[MaskRCNN] INFO : RPN box loss: 0.03276
DLL 2020-12-22 17:57:38.583972 - Iteration: 72940 RPN box loss : 0.03276
[MaskRCNN] INFO : RPN score loss: 0.04517
DLL 2020-12-22 17:57:38.584498 - Iteration: 72940 RPN score loss : 0.04517
[MaskRCNN] INFO : RPN total loss: 0.07792
DLL 2020-12-22 17:57:38.585049 - Iteration: 72940 RPN total loss : 0.07792
[MaskRCNN] INFO : Total loss: 2.0106
DLL 2020-12-22 17:57:38.585582 - Iteration: 72940 Total loss : 2.0106

[MaskRCNN] INFO : timestamp: 1608660015.4845078
[MaskRCNN] INFO : iteration: 72945
DLL 2020-12-22 18:00:15.533471 - iteration : 72945
[MaskRCNN] INFO : throughput: 3.9 samples/sec
DLL 2020-12-22 18:00:15.540926 - Iteration: 72945 throughput : 3.8563356106862883
[MaskRCNN] INFO : ==================== Metrics =====================
[MaskRCNN] INFO : FastRCNN box loss: 0.53189
DLL 2020-12-22 18:00:15.791674 - Iteration: 72945 FastRCNN box loss : 0.53189
[MaskRCNN] INFO : FastRCNN class loss: 0.24809
DLL 2020-12-22 18:00:15.941710 - Iteration: 72945 FastRCNN class loss : 0.24809
[MaskRCNN] INFO : FastRCNN total loss: 0.77998
DLL 2020-12-22 18:00:15.945204 - Iteration: 72945 FastRCNN total loss : 0.77998
[MaskRCNN] INFO : L2 loss: 0.36534
DLL 2020-12-22 18:00:15.945805 - Iteration: 72945 L2 loss : 0.36534
[MaskRCNN] INFO : Learning rate: 0.005
DLL 2020-12-22 18:00:15.946381 - Iteration: 72945 Learning rate : 0.005
[MaskRCNN] INFO : Mask loss: 0.45454
DLL 2020-12-22 18:00:15.946936 - Iteration: 72945 Mask loss : 0.45454
[MaskRCNN] INFO : RPN box loss: 0.02068
DLL 2020-12-22 18:00:15.947430 - Iteration: 72945 RPN box loss : 0.02068
[MaskRCNN] INFO : RPN score loss: 0.04317
DLL 2020-12-22 18:00:15.947910 - Iteration: 72945 RPN score loss : 0.04317
[MaskRCNN] INFO : RPN total loss: 0.06384
DLL 2020-12-22 18:00:15.948354 - Iteration: 72945 RPN total loss : 0.06384
[MaskRCNN] INFO : Total loss: 1.66371
DLL 2020-12-22 18:00:15.948847 - Iteration: 72945 Total loss : 1.66371

/usr/local/bin/tlt-train: line 32: 13439 Killed tlt-train-g1 ${PYTHON_ARGS[*]}

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun.real detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[13466,1],0]
Exit code: 137

I’m afraid it is killed by OOM. In the training spec, please replace the pre-trained model with your latest saving checkpoint tlt model, then trigger training.

Ok went back to the beginning and did a reset. Using the settings referenced in the 2 GPU config with train_batch_size 2 get a near immediate out of memory error. Setting train_batch_size 1 and num_steps_per_eval:5000 and get the following. The following is copy/paste the error message section with full details including the 5000 step.

Thoughts on what I can try that is memor related. Two 1080 cards with 16 gb of ram for the computer.

2020-12-23 18:29:06.597510: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats:
Limit: 6926172160
InUse: 3865136128
MaxInUse: 6088652544
NumAllocs: 14795898
MaxAllocSize: 2302935040

2020-12-23 18:29:06.597529: W tensorflow/core/common_runtime/bfc_allocator.cc:424] _******************************************_________________________________________
2020-12-23 18:29:06.597551: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at pack_op.cc:88 : Resource exhausted: OOM when allocating tensor with shape[8,5,208,336,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2020-12-23 18:29:06.729484: W tensorflow/core/kernels/data/cache_dataset_ops.cc:824] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to dataset.cache().take(k).repeat(). You should use dataset.take(k).cache().repeat() instead.
Using TensorFlow backend.
4 ops no flops stats due to incomplete shapes.
4 ops no flops stats due to incomplete shapes.
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1365, in _do_call
return fn(*args)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1350, in _run_fn
target_list, run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[8,5,208,336,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node multilevel_crop_and_resize/stack}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[generate_detections/denormalize_box/concat/_1933]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[8,5,208,336,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node multilevel_crop_and_resize/stack}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred

!cat $SPECS_DIR/maskrcnn_train_resnet50.txt
seed: 123
use_amp: False
warmup_steps: 1000
checkpoint: “/workspace/tlt-experiments/maskrcnn/exp1/pretrained_resnet50/tlt_instance_segmentation_vresnet50/resnet50.hdf5”
learning_rate_steps: “[36000, 54000]”
learning_rate_decay_levels: “[0.1, 0.01]”
total_steps: 720000
train_batch_size: 2
eval_batch_size: 8
num_steps_per_eval: 5000
momentum: 0.9
l2_weight_decay: 0.00002
warmup_learning_rate: 0.00001
init_learning_rate: 0.005

data_config{
image_size: “(832, 1344)”
augment_input_data: True
eval_samples: 500
training_file_pattern: “/workspace/tlt-experiments/maskrcnn/data/train*.tfrecord”
validation_file_pattern: “/workspace/tlt-experiments/maskrcnn/data/val*.tfrecord”
val_json_file: “/workspace/tlt-experiments/maskrcnn/data/annotations/instances_val2017.json”

# dataset specific parameters
num_classes: 91
skip_crowd_during_training: True

}

maskrcnn_config {
nlayers: 50
arch: “resnet”
freeze_bn: True
freeze_blocks: “[0,1]”
gt_mask_size: 112

# Region Proposal Network
rpn_positive_overlap: 0.7
rpn_negative_overlap: 0.3
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_min_size: 0.

# Proposal layer.
batch_size_per_im: 512
fg_fraction: 0.25
fg_thresh: 0.5
bg_thresh_hi: 0.5
bg_thresh_lo: 0.

# Faster-RCNN heads.
fast_rcnn_mlp_head_dim: 1024
bbox_reg_weights: "(10., 10., 5., 5.)"

# Mask-RCNN heads.
include_mask: True
mrcnn_resolution: 28

# training
train_rpn_pre_nms_topn: 2000
train_rpn_post_nms_topn: 1000
train_rpn_nms_threshold: 0.7

# evaluation
test_detections_per_image: 100
test_nms: 0.5
test_rpn_pre_nms_topn: 1000
test_rpn_post_nms_topn: 1000
test_rpn_nms_thresh: 0.7

# model architecture
min_level: 2
max_level: 6
num_scales: 1
aspect_ratios: "[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]"
anchor_scale: 8

# localization loss
rpn_box_loss_weight: 1.0
fast_rcnn_box_loss_weight: 1.0
mrcnn_weight_loss_mask: 1.0

}

[MaskRCNN] INFO : timestamp: 1608748114.81256
[MaskRCNN] INFO : iteration: 5000
DLL 2020-12-23 18:28:34.812857 - iteration : 5000
[MaskRCNN] INFO : throughput: 3.9 samples/sec
DLL 2020-12-23 18:28:34.812982 - Iteration: 5000 throughput : 3.917774070720457
[MaskRCNN] INFO : ==================== Metrics =====================
[MaskRCNN] INFO : FastRCNN box loss: 0.32957
DLL 2020-12-23 18:28:34.813430 - Iteration: 5000 FastRCNN box loss : 0.32957
[MaskRCNN] INFO : FastRCNN class loss: 0.24948
DLL 2020-12-23 18:28:34.813532 - Iteration: 5000 FastRCNN class loss : 0.24948
[MaskRCNN] INFO : FastRCNN total loss: 0.57905
DLL 2020-12-23 18:28:34.813641 - Iteration: 5000 FastRCNN total loss : 0.57905
[MaskRCNN] INFO : L2 loss: 0.44329
DLL 2020-12-23 18:28:34.813736 - Iteration: 5000 L2 loss : 0.44329
[MaskRCNN] INFO : Learning rate: 0.005
DLL 2020-12-23 18:28:34.813829 - Iteration: 5000 Learning rate : 0.005
[MaskRCNN] INFO : Mask loss: 0.47228
DLL 2020-12-23 18:28:34.813921 - Iteration: 5000 Mask loss : 0.47228
[MaskRCNN] INFO : RPN box loss: 0.04858
DLL 2020-12-23 18:28:34.814012 - Iteration: 5000 RPN box loss : 0.04858
[MaskRCNN] INFO : RPN score loss: 0.04
DLL 2020-12-23 18:28:34.814102 - Iteration: 5000 RPN score loss : 0.04
[MaskRCNN] INFO : RPN total loss: 0.08858
DLL 2020-12-23 18:28:34.814192 - Iteration: 5000 RPN total loss : 0.08858
[MaskRCNN] INFO : Total loss: 1.5832
DLL 2020-12-23 18:28:34.814281 - Iteration: 5000 Total loss : 1.5832

2020-12-23 18:28:35.351619: W tensorflow/core/kernels/data/cache_dataset_ops.cc:824] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to dataset.cache().take(k).repeat(). You should use dataset.take(k).cache().repeat() instead.
[MaskRCNN] INFO : Saving checkpoints for 5000 into /workspace/tlt-experiments/maskrcnn/exp1/experiment_dir_unpruned/model.step-5000.tlt.
2020-12-23 18:28:41.873395: W tensorflow/core/kernels/data/cache_dataset_ops.cc:824] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to dataset.cache().take(k).repeat(). You should use dataset.take(k).cache().repeat() instead.

[MaskRCNN] INFO : =================================
[MaskRCNN] INFO : Start evaluation cycle 01
[MaskRCNN] INFO : =================================

[MaskRCNN] INFO : Loading weights from /workspace/tlt-experiments/maskrcnn/exp1/experiment_dir_unpruned/model.step-5000.tlt
loading annotations into memory…
Done (t=1.16s)
creating index…
index created!
[MaskRCNN] INFO : [*] Limiting the amount of sample to: 500
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: multilevel_propose_rois/level_2/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: multilevel_propose_rois/level_3/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: multilevel_propose_rois/level_4/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: multilevel_propose_rois/level_5/
[MaskRCNN] INFO : [ROI OPs] Using Batched NMS… Scope: multilevel_propose_rois/level_6/
Parsing Inputs…
[MaskRCNN] INFO : [Inference Compute Statistics] 534.2 GFLOPS/image
2020-12-23 18:28:51.069036: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-23 18:28:51.071070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
2020-12-23 18:28:51.071159: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-23 18:28:51.071388: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-23 18:28:51.071420: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-23 18:28:51.071440: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-23 18:28:51.071461: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-23 18:28:51.071482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-23 18:28:51.071503: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-23 18:28:51.071568: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-23 18:28:51.071837: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-23 18:28:51.072061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-12-23 18:28:51.103832: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-23 18:28:51.103847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-12-23 18:28:51.103875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-12-23 18:28:51.106455: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-23 18:28:51.106767: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-23 18:28:51.107026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6605 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-12-23 18:28:52.409469: W tensorflow/core/framework/dataset.cc:382] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
2020-12-23 18:28:56.590678: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.54GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-12-23 18:29:06.594592: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.67GiB (rounded to 2862612480). Current allocation summary follows.
2020-12-23 18:29:06.594641: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256): Total Chunks: 171, Chunks in use: 171. 42.8KiB allocated for chunks. 42.8KiB in use in bin. 8.6KiB client-requested in use in bin.
2020-12-23 18:29:06.594650: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (512): Total Chunks: 38, Chunks in use: 38. 19.0KiB allocated for chunks. 19.0KiB in use in bin. 18.4KiB client-requested in use in bin.
2020-12-23 18:29:06.594656: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1024): Total Chunks: 85, Chunks in use: 85. 86.5KiB allocated for chunks. 86.5KiB in use in bin. 85.6KiB client-requested in use in bin.
2020-12-23 18:29:06.594662: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2048): Total Chunks: 51, Chunks in use: 51. 107.8KiB allocated for chunks. 107.8KiB in use in bin. 107.3KiB client-requested in use in bin.
2020-12-23 18:29:06.594668: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4096): Total Chunks: 32, Chunks in use: 32. 128.0KiB allocated for chunks. 128.0KiB in use in bin. 128.0KiB client-requested in use in bin.
2020-12-23 18:29:06.594674: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8192): Total Chunks: 20, Chunks in use: 20. 168.0KiB allocated for chunks. 168.0KiB in use in bin. 168.0KiB client-requested in use in bin.
2020-12-23 18:29:06.594680: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16384): Total Chunks: 12, Chunks in use: 12. 289.5KiB allocated for chunks. 289.5KiB in use in bin. 287.9KiB client-requested in use in bin.
2020-12-23 18:29:06.594686: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (32768): Total Chunks: 2, Chunks in use: 2. 73.5KiB allocated for chunks. 73.5KiB in use in bin. 73.5KiB client-requested in use in bin.
2020-12-23 18:29:06.594692: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (65536): Total Chunks: 22, Chunks in use: 21. 1.74MiB allocated for chunks. 1.64MiB in use in bin. 1.64MiB client-requested in use in bin.
2020-12-23 18:29:06.594697: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (131072): Total Chunks: 9, Chunks in use: 8. 1.22MiB allocated for chunks. 1.09MiB in use in bin. 1.09MiB client-requested in use in bin.
2020-12-23 18:29:06.594702: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (262144): Total Chunks: 27, Chunks in use: 27. 8.31MiB allocated for chunks. 8.31MiB in use in bin. 8.31MiB client-requested in use in bin.
2020-12-23 18:29:06.594708: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (524288): Total Chunks: 14, Chunks in use: 14. 7.50MiB allocated for chunks. 7.50MiB in use in bin. 7.50MiB client-requested in use in bin.
2020-12-23 18:29:06.594714: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1048576): Total Chunks: 34, Chunks in use: 33. 38.29MiB allocated for chunks. 36.84MiB in use in bin. 36.84MiB client-requested in use in bin.
2020-12-23 18:29:06.594720: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (2097152): Total Chunks: 39, Chunks in use: 39. 87.03MiB allocated for chunks. 87.03MiB in use in bin. 87.03MiB client-requested in use in bin.
2020-12-23 18:29:06.594726: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (4194304): Total Chunks: 20, Chunks in use: 20. 98.77MiB allocated for chunks. 98.77MiB in use in bin. 98.77MiB client-requested in use in bin.
2020-12-23 18:29:06.594731: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (8388608): Total Chunks: 8, Chunks in use: 8. 70.00MiB allocated for chunks. 70.00MiB in use in bin. 70.00MiB client-requested in use in bin.
2020-12-23 18:29:06.594737: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-12-23 18:29:06.594742: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (33554432): Total Chunks: 2, Chunks in use: 2. 98.00MiB allocated for chunks. 98.00MiB in use in bin. 98.00MiB client-requested in use in bin.
2020-12-23 18:29:06.594748: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (67108864): Total Chunks: 2, Chunks in use: 0. 192.53MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-12-23 18:29:06.594753: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-12-23 18:29:06.594758: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (268435456): Total Chunks: 7, Chunks in use: 6. 5.86GiB allocated for chunks. 3.20GiB in use in bin. 3.20GiB client-requested in use in bin.
2020-12-23 18:29:06.594763: I tensorflow/core/common_runtime/bfc_allocator.cc:885] Bin for 2.67GiB was 256.00MiB, Chunk State:
2020-12-23 18:29:06.594772: I tensorflow/core/common_runtime/bfc_allocator.cc:891] Size: 2.66GiB | Requested Size: 546.00MiB | in_use: 0 | bin_num: 20, prev: Size: 546.00MiB | Requested Size: 546.00MiB | in_use: 1 | bin_num: -1
2020-12-23 18:29:06.594777: I tensorflow/core/common_runtime/bfc_allocator.cc:898] Next region of size 6926172160
2020-12-23 18:29:06.594783: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6000000 next 1 of size 1280
2020-12-23 18:29:06.594787: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6000500 next 2 of size 256
2020-12-23 18:29:06.594792: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6000600 next 3 of size 256
2020-12-23 18:29:06.594796: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6000700 next 4 of size 256
2020-12-23 18:29:06.594800: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6000800 next 5 of size 256
2020-12-23 18:29:06.594804: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6000900 next 6 of size 256
2020-12-23 18:29:06.594809: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6000a00 next 7 of size 256
2020-12-23 18:29:06.594813: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6000b00 next 8 of size 256
2020-12-23 18:29:06.594817: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6000c00 next 9 of size 256
2020-12-23 18:29:06.594821: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6000d00 next 10 of size 256
2020-12-23 18:29:06.594826: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6000e00 next 11 of size 256
2020-12-23 18:29:06.594830: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6000f00 next 12 of size 256
2020-12-23 18:29:06.594834: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001000 next 13 of size 256
2020-12-23 18:29:06.594838: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001100 next 14 of size 256
2020-12-23 18:29:06.594843: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001200 next 15 of size 256
2020-12-23 18:29:06.594847: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001300 next 16 of size 256
2020-12-23 18:29:06.594851: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001400 next 17 of size 256
2020-12-23 18:29:06.594855: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001500 next 18 of size 256
2020-12-23 18:29:06.594860: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001600 next 19 of size 256
2020-12-23 18:29:06.594864: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001700 next 20 of size 256
2020-12-23 18:29:06.594868: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001800 next 21 of size 256
2020-12-23 18:29:06.594872: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001900 next 22 of size 256
2020-12-23 18:29:06.594877: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001a00 next 23 of size 256
2020-12-23 18:29:06.594881: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001b00 next 24 of size 256
2020-12-23 18:29:06.594885: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001c00 next 25 of size 256
2020-12-23 18:29:06.594890: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001d00 next 26 of size 256
2020-12-23 18:29:06.594894: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001e00 next 27 of size 256
2020-12-23 18:29:06.594899: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6001f00 next 28 of size 256
2020-12-23 18:29:06.594903: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6002000 next 29 of size 256
2020-12-23 18:29:06.594907: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6002100 next 30 of size 256
2020-12-23 18:29:06.594912: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6002200 next 31 of size 256
2020-12-23 18:29:06.594916: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6002300 next 32 of size 256
2020-12-23 18:29:06.594921: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6002400 next 33 of size 256
2020-12-23 18:29:06.594925: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6002500 next 34 of size 256
2020-12-23 18:29:06.594929: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6002600 next 35 of size 256
2020-12-23 18:29:06.594933: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6002700 next 36 of size 256
2020-12-23 18:29:06.594938: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6002800 next 37 of size 256
2020-12-23 18:29:06.594942: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6002900 next 38 of size 1024
2020-12-23 18:29:06.594947: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6002d00 next 39 of size 256
2020-12-23 18:29:06.594951: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6002e00 next 40 of size 256
2020-12-23 18:29:06.594956: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6002f00 next 41 of size 256
2020-12-23 18:29:06.594960: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6003000 next 42 of size 256
2020-12-23 18:29:06.594965: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6003100 next 43 of size 256
2020-12-23 18:29:06.594969: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6003200 next 44 of size 256
2020-12-23 18:29:06.594973: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6003300 next 45 of size 256
2020-12-23 18:29:06.594977: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6003400 next 46 of size 256
2020-12-23 18:29:06.594982: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6003500 next 47 of size 256
2020-12-23 18:29:06.594986: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6003600 next 48 of size 512
2020-12-23 18:29:06.594991: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6003800 next 49 of size 256
2020-12-23 18:29:06.594996: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6003900 next 50 of size 256
2020-12-23 18:29:06.595000: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6003a00 next 51 of size 256
2020-12-23 18:29:06.595004: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6003b00 next 52 of size 256
2020-12-23 18:29:06.595009: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6003c00 next 53 of size 256
2020-12-23 18:29:06.595013: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6003d00 next 54 of size 2048
2020-12-23 18:29:06.595018: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6004500 next 55 of size 256
2020-12-23 18:29:06.595022: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6004600 next 56 of size 256
2020-12-23 18:29:06.595026: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6004700 next 57 of size 256
2020-12-23 18:29:06.595031: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6004800 next 58 of size 1024
2020-12-23 18:29:06.595035: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6004c00 next 59 of size 512
2020-12-23 18:29:06.595039: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6004e00 next 60 of size 256
2020-12-23 18:29:06.595043: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6004f00 next 61 of size 256
2020-12-23 18:29:06.595048: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6005000 next 62 of size 256
2020-12-23 18:29:06.595052: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6005100 next 63 of size 256
2020-12-23 18:29:06.595056: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6005200 next 64 of size 2048
2020-12-23 18:29:06.595060: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6005a00 next 65 of size 256
2020-12-23 18:29:06.595065: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6005b00 next 66 of size 256
2020-12-23 18:29:06.595069: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6005c00 next 67 of size 256
2020-12-23 18:29:06.595073: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6005d00 next 68 of size 256
2020-12-23 18:29:06.595077: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6005e00 next 69 of size 256
2020-12-23 18:29:06.595081: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6005f00 next 70 of size 256
2020-12-23 18:29:06.595086: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6006000 next 71 of size 256
2020-12-23 18:29:06.595090: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ee6006100 next 72 of size 256

SECTION DELETED TO ALLOW POSITING

2020-12-23 18:29:06.595700: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136b300 next 647 of size 256
2020-12-23 18:29:06.595704: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136b400 next 437 of size 1024
2020-12-23 18:29:06.595708: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136b800 next 438 of size 256
2020-12-23 18:29:06.595713: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136b900 next 720 of size 1024
2020-12-23 18:29:06.595717: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136bd00 next 721 of size 256
2020-12-23 18:29:06.595721: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136be00 next 235 of size 1024
2020-12-23 18:29:06.595725: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136c200 next 470 of size 512
2020-12-23 18:29:06.595730: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136c400 next 536 of size 3072
2020-12-23 18:29:06.595734: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136d000 next 332 of size 512
2020-12-23 18:29:06.595738: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136d200 next 249 of size 1024
2020-12-23 18:29:06.595742: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136d600 next 250 of size 512
2020-12-23 18:29:06.595747: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136d800 next 289 of size 1024
2020-12-23 18:29:06.595751: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136dc00 next 692 of size 1024
2020-12-23 18:29:06.595755: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136e000 next 377 of size 1024
2020-12-23 18:29:06.595759: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136e400 next 674 of size 4096
2020-12-23 18:29:06.595763: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136f400 next 245 of size 256
2020-12-23 18:29:06.595767: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136f500 next 386 of size 1024
2020-12-23 18:29:06.595772: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136f900 next 471 of size 1024
2020-12-23 18:29:06.595776: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef136fd00 next 266 of size 1024
2020-12-23 18:29:06.595780: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef1370100 next 639 of size 147456
2020-12-23 18:29:06.595784: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef1394100 next 640 of size 65536
2020-12-23 18:29:06.595789: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef13a4100 next 481 of size 256
2020-12-23 18:29:06.595793: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef13a4200 next 482 of size 1024
2020-12-23 18:29:06.595797: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef13a4600 next 265 of size 147456
2020-12-23 18:29:06.595801: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef13c8600 next 267 of size 8192
2020-12-23 18:29:06.595805: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef13ca600 next 322 of size 1024
2020-12-23 18:29:06.595809: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef13caa00 next 758 of size 4096
2020-12-23 18:29:06.595814: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef13cba00 next 343 of size 2097152
2020-12-23 18:29:06.595818: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef15cba00 next 344 of size 4096
2020-12-23 18:29:06.595822: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef15cca00 next 728 of size 256
2020-12-23 18:29:06.595826: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef15ccb00 next 508 of size 1048576
2020-12-23 18:29:06.595831: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef16ccb00 next 314 of size 8192
2020-12-23 18:29:06.595835: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef16ceb00 next 743 of size 256
2020-12-23 18:29:06.595839: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef16cec00 next 641 of size 1024
2020-12-23 18:29:06.595844: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef16cf000 next 380 of size 65536
2020-12-23 18:29:06.595848: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef16df000 next 381 of size 256
2020-12-23 18:29:06.595852: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef16df100 next 457 of size 256
2020-12-23 18:29:06.595856: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef16df200 next 778 of size 1024
2020-12-23 18:29:06.595861: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef16df600 next 328 of size 512
2020-12-23 18:29:06.595865: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef16df800 next 329 of size 2048
2020-12-23 18:29:06.595869: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef16e0000 next 210 of size 512
2020-12-23 18:29:06.595873: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef16e0200 next 480 of size 1024
2020-12-23 18:29:06.595877: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef16e0600 next 362 of size 4096
2020-12-23 18:29:06.595882: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef16e1600 next 363 of size 1024
2020-12-23 18:29:06.595886: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef16e1a00 next 701 of size 1024
2020-12-23 18:29:06.595890: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef16e1e00 next 702 of size 1048576
2020-12-23 18:29:06.595894: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef17e1e00 next 491 of size 256
2020-12-23 18:29:06.595898: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef17e1f00 next 492 of size 1024
2020-12-23 18:29:06.595902: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef17e2300 next 676 of size 9437184
2020-12-23 18:29:06.595907: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef20e2300 next 497 of size 65536
2020-12-23 18:29:06.595911: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef20f2300 next 226 of size 256
2020-12-23 18:29:06.595915: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef20f2400 next 511 of size 8192
2020-12-23 18:29:06.595919: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef20f4400 next 509 of size 147456
2020-12-23 18:29:06.595923: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2118400 next 510 of size 1024
2020-12-23 18:29:06.595928: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2118800 next 716 of size 512
2020-12-23 18:29:06.595932: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2118a00 next 717 of size 2048
2020-12-23 18:29:06.595936: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2119200 next 555 of size 1024
2020-12-23 18:29:06.595940: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2119600 next 695 of size 4096
2020-12-23 18:29:06.595944: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef211a600 next 297 of size 1024
2020-12-23 18:29:06.595948: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef211aa00 next 298 of size 4096
2020-12-23 18:29:06.595953: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef211ba00 next 498 of size 256
2020-12-23 18:29:06.595957: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef211bb00 next 499 of size 16384
2020-12-23 18:29:06.595961: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef211fb00 next 335 of size 256
2020-12-23 18:29:06.595965: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef211fc00 next 336 of size 256
2020-12-23 18:29:06.595969: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef211fd00 next 503 of size 65536
2020-12-23 18:29:06.595974: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef212fd00 next 504 of size 65536
2020-12-23 18:29:06.595979: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef213fd00 next 684 of size 512
2020-12-23 18:29:06.595983: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef213ff00 next 686 of size 1048576
2020-12-23 18:29:06.595987: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef223ff00 next 605 of size 2048
2020-12-23 18:29:06.595991: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2240700 next 606 of size 1024
2020-12-23 18:29:06.595996: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2240b00 next 487 of size 4096
2020-12-23 18:29:06.596000: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2241b00 next 488 of size 1024
2020-12-23 18:29:06.596004: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2241f00 next 601 of size 524288
2020-12-23 18:29:06.596008: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef22c1f00 next 602 of size 1024
2020-12-23 18:29:06.596012: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef22c2300 next 274 of size 2048
2020-12-23 18:29:06.596017: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef22c2b00 next 275 of size 1024
2020-12-23 18:29:06.596021: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef22c2f00 next 196 of size 262144
2020-12-23 18:29:06.596025: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2302f00 next 777 of size 1024
2020-12-23 18:29:06.596029: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2303300 next 543 of size 2048
2020-12-23 18:29:06.596033: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2303b00 next 544 of size 2359296
2020-12-23 18:29:06.596038: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2543b00 next 582 of size 8192
2020-12-23 18:29:06.596042: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2545b00 next 383 of size 2048
2020-12-23 18:29:06.596046: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2546300 next 742 of size 2359296
2020-12-23 18:29:06.596050: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2786300 next 365 of size 256
2020-12-23 18:29:06.596054: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2786400 next 252 of size 1024
2020-12-23 18:29:06.596058: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2786800 next 253 of size 1024
2020-12-23 18:29:06.596063: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2786c00 next 254 of size 256
2020-12-23 18:29:06.596067: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2786d00 next 398 of size 1024
2020-12-23 18:29:06.596071: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2787100 next 399 of size 256
2020-12-23 18:29:06.596075: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2787200 next 207 of size 256
2020-12-23 18:29:06.596079: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2787300 next 338 of size 256
2020-12-23 18:29:06.596083: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2787400 next 339 of size 512
2020-12-23 18:29:06.596088: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2787600 next 699 of size 2048
2020-12-23 18:29:06.596092: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2787e00 next 461 of size 1024
2020-12-23 18:29:06.596096: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2788200 next 463 of size 4096
2020-12-23 18:29:06.596100: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2789200 next 268 of size 2359296
2020-12-23 18:29:06.596105: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef29c9200 next 269 of size 4096
2020-12-23 18:29:06.596109: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef29ca200 next 369 of size 2359296
2020-12-23 18:29:06.596114: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2c0a200 next 384 of size 512
2020-12-23 18:29:06.596118: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2c0a400 next 337 of size 512
2020-12-23 18:29:06.596122: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2c0a600 next 513 of size 1048576
2020-12-23 18:29:06.596126: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2d0a600 next 514 of size 4096
2020-12-23 18:29:06.596130: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2d0b600 next 310 of size 2048
2020-12-23 18:29:06.596134: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2d0be00 next 311 of size 2048
2020-12-23 18:29:06.596139: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2d0c600 next 725 of size 2048
2020-12-23 18:29:06.596143: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2d0ce00 next 468 of size 8192
2020-12-23 18:29:06.596147: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2d0ee00 next 579 of size 1024
2020-12-23 18:29:06.596151: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2d0f200 next 580 of size 1024
2020-12-23 18:29:06.596156: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2d0f600 next 445 of size 256
2020-12-23 18:29:06.596160: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2d0f700 next 364 of size 262144
2020-12-23 18:29:06.596164: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2d4f700 next 783 of size 1048576
2020-12-23 18:29:06.596168: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2e4f700 next 787 of size 512
2020-12-23 18:29:06.596172: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2e4f900 next 417 of size 262144
2020-12-23 18:29:06.596177: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2e8f900 next 418 of size 2048
2020-12-23 18:29:06.596181: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2e90100 next 258 of size 2048
2020-12-23 18:29:06.596185: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2e90900 next 195 of size 2048
2020-12-23 18:29:06.596189: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2e91100 next 286 of size 1024
2020-12-23 18:29:06.596193: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2e91500 next 287 of size 256
2020-12-23 18:29:06.596198: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2e91600 next 493 of size 4096
2020-12-23 18:29:06.596202: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2e92600 next 203 of size 1024
2020-12-23 18:29:06.596206: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2e92a00 next 754 of size 1024
2020-12-23 18:29:06.596210: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef2e92e00 next 462 of size 2359296
2020-12-23 18:29:06.596214: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef30d2e00 next 430 of size 2048
2020-12-23 18:29:06.596218: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef30d3600 next 224 of size 1024
2020-12-23 18:29:06.596222: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef30d3a00 next 262 of size 2048
2020-12-23 18:29:06.596227: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef30d4200 next 636 of size 1024
2020-12-23 18:29:06.596231: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef30d4600 next 642 of size 8192
2020-12-23 18:29:06.596236: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef30d6600 next 657 of size 512
2020-12-23 18:29:06.596240: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef30d6800 next 231 of size 4096
2020-12-23 18:29:06.596244: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef30d7800 next 464 of size 1048576
2020-12-23 18:29:06.596248: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef31d7800 next 201 of size 512
2020-12-23 18:29:06.596253: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef31d7a00 next 202 of size 4096
2020-12-23 18:29:06.596257: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef31d8a00 next 650 of size 2097152
2020-12-23 18:29:06.596261: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef33d8a00 next 259 of size 512
2020-12-23 18:29:06.596265: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef33d8c00 next 741 of size 262144
2020-12-23 18:29:06.596270: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef3418c00 next 239 of size 2048
2020-12-23 18:29:06.596274: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef3419400 next 261 of size 4096
2020-12-23 18:29:06.596278: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef341a400 next 371 of size 2048
2020-12-23 18:29:06.596282: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef341ac00 next 372 of size 1048576
2020-12-23 18:29:06.596287: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef351ac00 next 422 of size 524288
2020-12-23 18:29:06.596291: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359ac00 next 423 of size 1024
2020-12-23 18:29:06.596295: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359b000 next 237 of size 2048
2020-12-23 18:29:06.596299: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359b800 next 238 of size 1024
2020-12-23 18:29:06.596303: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359bc00 next 654 of size 1024
2020-12-23 18:29:06.596308: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359c000 next 655 of size 1024
2020-12-23 18:29:06.596312: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359c400 next 732 of size 512
2020-12-23 18:29:06.596316: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359c600 next 788 of size 2048
2020-12-23 18:29:06.596320: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359ce00 next 529 of size 1024
2020-12-23 18:29:06.596324: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359d200 next 530 of size 512
2020-12-23 18:29:06.596329: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359d400 next 213 of size 1024
2020-12-23 18:29:06.596333: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359d800 next 214 of size 512
2020-12-23 18:29:06.596337: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359da00 next 600 of size 2048
2020-12-23 18:29:06.596341: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359e200 next 489 of size 1024
2020-12-23 18:29:06.596345: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359e600 next 539 of size 1024
2020-12-23 18:29:06.596350: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359ea00 next 540 of size 512
2020-12-23 18:29:06.596354: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef359ec00 next 361 of size 589824
2020-12-23 18:29:06.596358: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef362ec00 next 412 of size 512
2020-12-23 18:29:06.596362: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef362ee00 next 621 of size 1024
2020-12-23 18:29:06.596367: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef362f200 next 622 of size 4096
2020-12-23 18:29:06.596371: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef3630200 next 541 of size 2359296
2020-12-23 18:29:06.596376: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef3870200 next 222 of size 4194304
2020-12-23 18:29:06.596380: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef3c70200 next 228 of size 4096
2020-12-23 18:29:06.596384: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef3c71200 next 220 of size 524288
2020-12-23 18:29:06.596388: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef3cf1200 next 221 of size 4096
2020-12-23 18:29:06.596392: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef3cf2200 next 547 of size 2048
2020-12-23 18:29:06.596397: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef3cf2a00 next 713 of size 1024
2020-12-23 18:29:06.596401: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef3cf2e00 next 349 of size 2097152
2020-12-23 18:29:06.596405: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef3ef2e00 next 350 of size 1024
2020-12-23 18:29:06.596409: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef3ef3200 next 312 of size 4096
2020-12-23 18:29:06.596413: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef3ef4200 next 313 of size 2048
2020-12-23 18:29:06.596418: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef3ef4a00 next 559 of size 1490944
2020-12-23 18:29:06.596422: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef4060a00 next 560 of size 4194304
2020-12-23 18:29:06.596426: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef4460a00 next 393 of size 512
2020-12-23 18:29:06.596430: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef4460c00 next 395 of size 8192
2020-12-23 18:29:06.596435: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef4462c00 next 321 of size 262144
2020-12-23 18:29:06.596439: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef44a2c00 next 535 of size 1024
2020-12-23 18:29:06.596443: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef44a3000 next 225 of size 262144
2020-12-23 18:29:06.596447: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef44e3000 next 563 of size 51380224
2020-12-23 18:29:06.596452: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef75e3000 next 755 of size 512
2020-12-23 18:29:06.596456: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef75e3200 next 428 of size 512
2020-12-23 18:29:06.596460: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef75e3400 next 729 of size 12288
2020-12-23 18:29:06.596464: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef75e6400 next 352 of size 2048
2020-12-23 18:29:06.596469: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef75e6c00 next 236 of size 1024
2020-12-23 18:29:06.596473: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef75e7000 next 627 of size 2048
2020-12-23 18:29:06.596477: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef75e7800 next 185 of size 1024
2020-12-23 18:29:06.596481: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef75e7c00 next 186 of size 589824
2020-12-23 18:29:06.596485: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7677c00 next 562 of size 512
2020-12-23 18:29:06.596489: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7677e00 next 653 of size 2048
2020-12-23 18:29:06.596494: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7678600 next 370 of size 1024
2020-12-23 18:29:06.596498: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7678a00 next 232 of size 512
2020-12-23 18:29:06.596503: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7678c00 next 718 of size 1024
2020-12-23 18:29:06.596507: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7679000 next 486 of size 1048576
2020-12-23 18:29:06.596511: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7779000 next 305 of size 4096
2020-12-23 18:29:06.596515: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef777a000 next 306 of size 4096
2020-12-23 18:29:06.596519: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef777b000 next 347 of size 512
2020-12-23 18:29:06.596524: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef777b200 next 348 of size 1024
2020-12-23 18:29:06.596528: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef777b600 next 255 of size 1024
2020-12-23 18:29:06.596532: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef777ba00 next 256 of size 4096
2020-12-23 18:29:06.596536: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef777ca00 next 593 of size 4194304
2020-12-23 18:29:06.596540: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7b7ca00 next 550 of size 372736
2020-12-23 18:29:06.596545: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7bd7a00 next 309 of size 2048
2020-12-23 18:29:06.596549: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7bd8200 next 458 of size 1536
2020-12-23 18:29:06.596553: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7bd8800 next 691 of size 4096
2020-12-23 18:29:06.596557: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7bd9800 next 532 of size 2359296
2020-12-23 18:29:06.596561: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7e19800 next 533 of size 1024
2020-12-23 18:29:06.596565: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7e19c00 next 200 of size 4096
2020-12-23 18:29:06.596570: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7e1ac00 next 230 of size 262144
2020-12-23 18:29:06.596574: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7e5ac00 next 406 of size 2048
2020-12-23 18:29:06.596578: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef7e5b400 next 407 of size 2359296
2020-12-23 18:29:06.596582: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef809b400 next 534 of size 2048
2020-12-23 18:29:06.596586: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef809bc00 next 278 of size 9437184
2020-12-23 18:29:06.596591: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef899bc00 next 279 of size 256
2020-12-23 18:29:06.596595: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef899bd00 next 208 of size 256
2020-12-23 18:29:06.596599: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef899be00 next 212 of size 256
2020-12-23 18:29:06.596603: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef899bf00 next 648 of size 1024
2020-12-23 18:29:06.596607: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef899c300 next 651 of size 1024
2020-12-23 18:29:06.596612: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef899c700 next 763 of size 256
2020-12-23 18:29:06.596616: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef899c800 next 750 of size 1024
2020-12-23 18:29:06.596620: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef899cc00 next 694 of size 256
2020-12-23 18:29:06.596624: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef899cd00 next 276 of size 512
2020-12-23 18:29:06.596629: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef899cf00 next 277 of size 256
2020-12-23 18:29:06.596633: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef899d000 next 199 of size 1024
2020-12-23 18:29:06.596637: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef899d400 next 567 of size 2048
2020-12-23 18:29:06.596641: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef899dc00 next 454 of size 4096
2020-12-23 18:29:06.596646: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef899ec00 next 644 of size 1048576
2020-12-23 18:29:06.596650: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef8a9ec00 next 677 of size 1024
2020-12-23 18:29:06.596654: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef8a9f000 next 376 of size 2359296
2020-12-23 18:29:06.596658: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef8cdf000 next 396 of size 1024
2020-12-23 18:29:06.596662: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef8cdf400 next 325 of size 4096
2020-12-23 18:29:06.596667: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef8ce0400 next 624 of size 2048
2020-12-23 18:29:06.596671: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef8ce0c00 next 629 of size 2048
2020-12-23 18:29:06.596675: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef8ce1400 next 726 of size 8192
2020-12-23 18:29:06.596679: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef8ce3400 next 518 of size 8388608
2020-12-23 18:29:06.596683: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef94e3400 next 771 of size 2048
2020-12-23 18:29:06.596688: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef94e3c00 next 564 of size 8192
2020-12-23 18:29:06.596692: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef94e5c00 next 241 of size 1024
2020-12-23 18:29:06.596696: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef94e6000 next 242 of size 2048
2020-12-23 18:29:06.596700: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef94e6800 next 671 of size 1024
2020-12-23 18:29:06.596704: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef94e6c00 next 588 of size 2359296
2020-12-23 18:29:06.596708: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9726c00 next 502 of size 2048
2020-12-23 18:29:06.596713: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9727400 next 223 of size 2048
2020-12-23 18:29:06.596717: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9727c00 next 229 of size 2359296
2020-12-23 18:29:06.596721: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9967c00 next 645 of size 2359296
2020-12-23 18:29:06.596725: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9ba7c00 next 401 of size 2359296
2020-12-23 18:29:06.596729: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9de7c00 next 404 of size 1024
2020-12-23 18:29:06.596733: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9de8000 next 194 of size 512
2020-12-23 18:29:06.596738: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9de8200 next 387 of size 4096
2020-12-23 18:29:06.596742: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9de9200 next 357 of size 4096
2020-12-23 18:29:06.596746: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9dea200 next 358 of size 8192
2020-12-23 18:29:06.596750: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9dec200 next 409 of size 2048
2020-12-23 18:29:06.596754: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9deca00 next 264 of size 1048576
2020-12-23 18:29:06.596759: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9eeca00 next 405 of size 1024
2020-12-23 18:29:06.596763: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9eece00 next 408 of size 1024
2020-12-23 18:29:06.596768: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9eed200 next 760 of size 1024
2020-12-23 18:29:06.596772: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9eed600 next 762 of size 2048
2020-12-23 18:29:06.596776: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9eede00 next 759 of size 1024
2020-12-23 18:29:06.596780: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9eee200 next 761 of size 256
2020-12-23 18:29:06.596784: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9eee300 next 500 of size 8192
2020-12-23 18:29:06.596789: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8ef9ef0300 next 672 of size 4194304
2020-12-23 18:29:06.596793: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efa2f0300 next 554 of size 1024
2020-12-23 18:29:06.596797: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efa2f0700 next 693 of size 1024
2020-12-23 18:29:06.596801: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efa2f0b00 next 354 of size 512
2020-12-23 18:29:06.596805: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efa2f0d00 next 300 of size 2048
2020-12-23 18:29:06.596810: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efa2f1500 next 459 of size 8192
2020-12-23 18:29:06.596814: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efa2f3500 next 197 of size 1024
2020-12-23 18:29:06.596818: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efa2f3900 next 294 of size 1048576
2020-12-23 18:29:06.596822: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efa3f3900 next 730 of size 2359296
2020-12-23 18:29:06.596826: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efa633900 next 733 of size 2359296
2020-12-23 18:29:06.596830: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efa873900 next 444 of size 1024
2020-12-23 18:29:06.596835: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efa873d00 next 723 of size 1024
2020-12-23 18:29:06.596839: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efa874100 next 724 of size 93184
2020-12-23 18:29:06.596843: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efa88ad00 next 466 of size 9437184
2020-12-23 18:29:06.596848: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb18ad00 next 467 of size 1048576
2020-12-23 18:29:06.596852: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb28ad00 next 784 of size 512
2020-12-23 18:29:06.596856: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb28af00 next 785 of size 589824
2020-12-23 18:29:06.596860: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31af00 next 595 of size 2048
2020-12-23 18:29:06.596865: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31b700 next 776 of size 256
2020-12-23 18:29:06.596869: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31b800 next 233 of size 256
2020-12-23 18:29:06.596873: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31b900 next 685 of size 256
2020-12-23 18:29:06.596877: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31ba00 next 455 of size 256
2020-12-23 18:29:06.596881: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31bb00 next 340 of size 256
2020-12-23 18:29:06.596885: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31bc00 next 293 of size 256
2020-12-23 18:29:06.596890: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31bd00 next 295 of size 256
2020-12-23 18:29:06.596894: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31be00 next 394 of size 256
2020-12-23 18:29:06.596898: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31bf00 next 678 of size 256
2020-12-23 18:29:06.596903: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31c000 next 670 of size 256
2020-12-23 18:29:06.596907: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31c100 next 439 of size 256
2020-12-23 18:29:06.596911: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31c200 next 705 of size 256
2020-12-23 18:29:06.596915: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31c300 next 574 of size 256
2020-12-23 18:29:06.596919: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31c400 next 575 of size 1024
2020-12-23 18:29:06.596923: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31c800 next 291 of size 256
2020-12-23 18:29:06.596928: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31c900 next 341 of size 256
2020-12-23 18:29:06.596932: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31ca00 next 703 of size 256
2020-12-23 18:29:06.596936: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31cb00 next 704 of size 3328
2020-12-23 18:29:06.596940: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31d800 next 658 of size 3328
2020-12-23 18:29:06.596945: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31e500 next 659 of size 512
2020-12-23 18:29:06.596949: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31e700 next 519 of size 256
2020-12-23 18:29:06.596953: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31e800 next 612 of size 3328
2020-12-23 18:29:06.596957: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31f500 next 355 of size 256
2020-12-23 18:29:06.596961: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31f600 next 656 of size 256
2020-12-23 18:29:06.596966: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb31f700 next 531 of size 1677312
2020-12-23 18:29:06.596970: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb4b8f00 next 475 of size 1677312
2020-12-23 18:29:06.596974: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb652700 next 476 of size 1677312
2020-12-23 18:29:06.596979: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb7ebf00 next 198 of size 1677312
2020-12-23 18:29:06.596983: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb985700 next 246 of size 419328
2020-12-23 18:29:06.596987: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efb9ebd00 next 614 of size 419328
2020-12-23 18:29:06.596991: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efba52300 next 594 of size 419328
2020-12-23 18:29:06.596997: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbab8900 next 681 of size 419328
2020-12-23 18:29:06.597001: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb1ef00 next 682 of size 104960
2020-12-23 18:29:06.597005: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb38900 next 745 of size 104960
2020-12-23 18:29:06.597009: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb52300 next 435 of size 104960
2020-12-23 18:29:06.597013: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb6bd00 next 436 of size 104960
2020-12-23 18:29:06.597018: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb85700 next 773 of size 26368
2020-12-23 18:29:06.597022: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb8be00 next 751 of size 26368
2020-12-23 18:29:06.597026: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb92500 next 734 of size 26368
2020-12-23 18:29:06.597030: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb98c00 next 496 of size 26368
2020-12-23 18:29:06.597035: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb9f300 next 748 of size 256
2020-12-23 18:29:06.597039: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb9f400 next 749 of size 256
2020-12-23 18:29:06.597043: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb9f500 next 469 of size 256
2020-12-23 18:29:06.597047: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb9f600 next 367 of size 256
2020-12-23 18:29:06.597051: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb9f700 next 620 of size 256
2020-12-23 18:29:06.597055: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb9f800 next 442 of size 256
2020-12-23 18:29:06.597060: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb9f900 next 443 of size 256
2020-12-23 18:29:06.597064: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb9fa00 next 747 of size 256
2020-12-23 18:29:06.597068: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb9fb00 next 248 of size 256
2020-12-23 18:29:06.597072: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb9fc00 next 284 of size 256
2020-12-23 18:29:06.597076: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb9fd00 next 342 of size 256
2020-12-23 18:29:06.597080: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb9fe00 next 706 of size 256
2020-12-23 18:29:06.597084: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbb9ff00 next 308 of size 256
2020-12-23 18:29:06.597089: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbba0000 next 643 of size 256
2020-12-23 18:29:06.597093: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbba0100 next 526 of size 256
2020-12-23 18:29:06.597097: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbba0200 next 527 of size 256
2020-12-23 18:29:06.597101: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbba0300 next 217 of size 256
2020-12-23 18:29:06.597105: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbba0400 next 218 of size 256
2020-12-23 18:29:06.597109: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbba0500 next 251 of size 256
2020-12-23 18:29:06.597113: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbba0600 next 323 of size 256
2020-12-23 18:29:06.597118: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbba0700 next 552 of size 256
2020-12-23 18:29:06.597122: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbba0800 next 700 of size 256
2020-12-23 18:29:06.597126: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbba0900 next 719 of size 256
2020-12-23 18:29:06.597130: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efbba0a00 next 501 of size 6272000
2020-12-23 18:29:06.597135: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efc19be00 next 585 of size 2508800
2020-12-23 18:29:06.597140: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efc400600 next 586 of size 2880000
2020-12-23 18:29:06.597144: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efc6bf800 next 388 of size 2508800
2020-12-23 18:29:06.597148: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efc924000 next 209 of size 6709248
2020-12-23 18:29:06.597153: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efcf8a000 next 273 of size 6709248
2020-12-23 18:29:06.597157: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efd5f0000 next 666 of size 6709248
2020-12-23 18:29:06.597161: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efdc56000 next 667 of size 6709248
2020-12-23 18:29:06.597165: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efe2bc000 next 424 of size 6709248
2020-12-23 18:29:06.597169: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efe922000 next 477 of size 6709248
2020-12-23 18:29:06.597174: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efef88000 next 431 of size 256
2020-12-23 18:29:06.597178: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free at 0x7f8efef88100 next 366 of size 131328
2020-12-23 18:29:06.597182: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efefa8200 next 205 of size 104960
2020-12-23 18:29:06.597186: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efefc1c00 next 415 of size 104960
2020-12-23 18:29:06.597191: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efefdb600 next 403 of size 104960
2020-12-23 18:29:06.597195: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8efeff5000 next 780 of size 104960
2020-12-23 18:29:06.597199: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8eff00ea00 next 625 of size 26368
2020-12-23 18:29:06.597203: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8eff015100 next 576 of size 26368
2020-12-23 18:29:06.597207: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8eff01b800 next 449 of size 26368
2020-12-23 18:29:06.597212: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8eff021f00 next 450 of size 26368
2020-12-23 18:29:06.597216: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8eff028600 next 391 of size 419328
2020-12-23 18:29:06.597220: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free at 0x7f8eff08ec00 next 770 of size 104960
2020-12-23 18:29:06.597224: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8eff0a8600 next 581 of size 26368
2020-12-23 18:29:06.597229: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8eff0aed00 next 569 of size 26368
2020-12-23 18:29:06.597233: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free at 0x7f8eff0b5400 next 390 of size 1519616
2020-12-23 18:29:06.597237: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8eff228400 next 587 of size 1677312
2020-12-23 18:29:06.597241: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8eff3c1c00 next 316 of size 6709248
2020-12-23 18:29:06.597246: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8effa27c00 next 775 of size 419328
2020-12-23 18:29:06.597250: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8effa8e200 next 649 of size 419328
2020-12-23 18:29:06.597254: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8effaf4800 next 592 of size 419328
2020-12-23 18:29:06.597258: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8effb5ae00 next 589 of size 419328
2020-12-23 18:29:06.597262: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free at 0x7f8effbc1400 next 571 of size 94530816
2020-12-23 18:29:06.597266: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8f055e8100 next 689 of size 256
2020-12-23 18:29:06.597271: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free at 0x7f8f055e8200 next 690 of size 107347968
2020-12-23 18:29:06.597275: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8f0bc48200 next 584 of size 256
2020-12-23 18:29:06.597279: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8f0bc48300 next 327 of size 256
2020-12-23 18:29:06.597284: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8f0bc48400 next 668 of size 572522496
2020-12-23 18:29:06.597288: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8f2de48400 next 204 of size 572522496
2020-12-23 18:29:06.597292: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8f50048400 next 739 of size 572522496
2020-12-23 18:29:06.597297: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8f72248400 next 478 of size 572522496
2020-12-23 18:29:06.597301: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8f94448400 next 608 of size 572522496
2020-12-23 18:29:06.597305: I tensorflow/core/common_runtime/bfc_allocator.cc:905] InUse at 0x7f8fb6648400 next 400 of size 572522496
2020-12-23 18:29:06.597309: I tensorflow/core/common_runtime/bfc_allocator.cc:905] Free at 0x7f8fd8848400 next 18446744073709551615 of size 2857401344
2020-12-23 18:29:06.597313: I tensorflow/core/common_runtime/bfc_allocator.cc:914] Summary of in-use Chunks by size:
2020-12-23 18:29:06.597319: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 171 Chunks of size 256 totalling 42.8KiB
2020-12-23 18:29:06.597324: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 38 Chunks of size 512 totalling 19.0KiB
2020-12-23 18:29:06.597329: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 81 Chunks of size 1024 totalling 81.0KiB
2020-12-23 18:29:06.597333: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 1280 totalling 2.5KiB
2020-12-23 18:29:06.597338: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 1536 totalling 3.0KiB
2020-12-23 18:29:06.597343: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 46 Chunks of size 2048 totalling 92.0KiB
2020-12-23 18:29:06.597347: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 3072 totalling 6.0KiB
2020-12-23 18:29:06.597352: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 3 Chunks of size 3328 totalling 9.8KiB
2020-12-23 18:29:06.597357: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 32 Chunks of size 4096 totalling 128.0KiB
2020-12-23 18:29:06.597362: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 18 Chunks of size 8192 totalling 144.0KiB
2020-12-23 18:29:06.597366: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 12288 totalling 24.0KiB
2020-12-23 18:29:06.597371: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 16384 totalling 32.0KiB
2020-12-23 18:29:06.597376: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 10 Chunks of size 26368 totalling 257.5KiB
2020-12-23 18:29:06.597381: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 37632 totalling 73.5KiB
2020-12-23 18:29:06.597386: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 12 Chunks of size 65536 totalling 768.0KiB
2020-12-23 18:29:06.597390: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 93184 totalling 91.0KiB
2020-12-23 18:29:06.597395: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 8 Chunks of size 104960 totalling 820.0KiB
2020-12-23 18:29:06.597400: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 131072 totalling 256.0KiB
2020-12-23 18:29:06.597405: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 6 Chunks of size 147456 totalling 864.0KiB
2020-12-23 18:29:06.597409: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 16 Chunks of size 262144 totalling 4.00MiB
2020-12-23 18:29:06.597414: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 372736 totalling 728.0KiB
2020-12-23 18:29:06.597419: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 9 Chunks of size 419328 totalling 3.60MiB
2020-12-23 18:29:06.597423: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 6 Chunks of size 524288 totalling 3.00MiB
2020-12-23 18:29:06.597428: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 8 Chunks of size 589824 totalling 4.50MiB
2020-12-23 18:29:06.597433: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 26 Chunks of size 1048576 totalling 26.00MiB
2020-12-23 18:29:06.597438: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 1490944 totalling 2.84MiB
2020-12-23 18:29:06.597442: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 5 Chunks of size 1677312 totalling 8.00MiB
2020-12-23 18:29:06.597447: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 6 Chunks of size 2097152 totalling 12.00MiB
2020-12-23 18:29:06.597452: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 30 Chunks of size 2359296 totalling 67.50MiB
2020-12-23 18:29:06.597457: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 2508800 totalling 4.79MiB
2020-12-23 18:29:06.597461: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 2880000 totalling 2.75MiB
2020-12-23 18:29:06.597466: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 12 Chunks of size 4194304 totalling 48.00MiB
2020-12-23 18:29:06.597471: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 1 Chunks of size 6272000 totalling 5.98MiB
2020-12-23 18:29:06.597475: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 7 Chunks of size 6709248 totalling 44.79MiB
2020-12-23 18:29:06.597480: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 8388608 totalling 16.00MiB
2020-12-23 18:29:06.597485: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 6 Chunks of size 9437184 totalling 54.00MiB
2020-12-23 18:29:06.597490: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 2 Chunks of size 51380224 totalling 98.00MiB
2020-12-23 18:29:06.597494: I tensorflow/core/common_runtime/bfc_allocator.cc:917] 6 Chunks of size 572522496 totalling 3.20GiB
2020-12-23 18:29:06.597499: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 3.60GiB
2020-12-23 18:29:06.597503: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 6926172160 memory_limit_: 6926172160 available bytes: 0 curr_region_allocation_bytes_: 13852344320
2020-12-23 18:29:06.597510: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats:
Limit: 6926172160
InUse: 3865136128
MaxInUse: 6088652544
NumAllocs: 14795898
MaxAllocSize: 2302935040

2020-12-23 18:29:06.597529: W tensorflow/core/common_runtime/bfc_allocator.cc:424] _******************************************_________________________________________
2020-12-23 18:29:06.597551: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at pack_op.cc:88 : Resource exhausted: OOM when allocating tensor with shape[8,5,208,336,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2020-12-23 18:29:06.729484: W tensorflow/core/kernels/data/cache_dataset_ops.cc:824] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to dataset.cache().take(k).repeat(). You should use dataset.take(k).cache().repeat() instead.
Using TensorFlow backend.
4 ops no flops stats due to incomplete shapes.
4 ops no flops stats due to incomplete shapes.
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1365, in _do_call
return fn(*args)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1350, in _run_fn
target_list, run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[8,5,208,336,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node multilevel_crop_and_resize/stack}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[generate_detections/denormalize_box/concat/_1933]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[8,5,208,336,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node multilevel_crop_and_resize/stack}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/bin/tlt-train-g1”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py”, line 58, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py”, line 187, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py”, line 90, in run_executer
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/distributed_executer.py”, line 420, in train_and_eval
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/evaluation.py”, line 326, in evaluate
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/evaluation.py”, line 128, in compute_coco_eval_metric
File “/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py”, line 640, in predict
preds_evaluated = mon_sess.run(predictions)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 754, in run
run_metadata=run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1259, in run
run_metadata=run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1360, in run
raise six.reraise(*original_exc_info)
File “/usr/local/lib/python3.6/dist-packages/six.py”, line 693, in reraise
raise value
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1345, in run
return self._sess.run(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1418, in run
run_metadata=run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1176, in run
return self._sess.run(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 956, in run
run_metadata_ptr)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1180, in _run
feed_dict_tensor, options, run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1359, in _do_run
run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[8,5,208,336,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node multilevel_crop_and_resize/stack (defined at usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[generate_detections/denormalize_box/concat/_1933]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[8,5,208,336,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node multilevel_crop_and_resize/stack (defined at usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Original stack trace for ‘multilevel_crop_and_resize/stack’:
File “usr/local/bin/tlt-train-g1”, line 8, in
sys.exit(main())
File “home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py”, line 58, in main
File “home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py”, line 187, in main
File “home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py”, line 90, in run_executer
File “home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/distributed_executer.py”, line 420, in train_and_eval
File “home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/evaluation.py”, line 326, in evaluate
File “home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/evaluation.py”, line 128, in compute_coco_eval_metric
File “usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py”, line 622, in predict
features, None, ModeKeys.PREDICT, self.config)
File “usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py”, line 1149, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File “home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/mask_rcnn_model.py”, line 548, in mask_rcnn_model_fn
File “home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/mask_rcnn_model.py”, line 392, in _model_fn
File “home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/mask_rcnn_model.py”, line 227, in build_model_graph
File “home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/ops/spatial_transform_ops.py”, line 298, in multilevel_crop_and_resize
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/dispatch.py”, line 180, in wrapper
return target(*args, **kwargs)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/array_ops.py”, line 1154, in stack
return gen_array_ops.pack(values, axis=axis, name=name)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_array_ops.py”, line 6303, in pack
“Pack”, values=values, axis=axis, name=name)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py”, line 794, in _apply_op_helper
op_def=op_def)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py”, line 507, in new_func
return func(*args, **kwargs)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3357, in create_op
attrs, op_def, compute_device)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3426, in _create_op_internal
op_def=op_def)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 1748, in init
self._traceback = tf_stack.extract_stack()

[MaskRCNN] INFO : # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ #
[MaskRCNN] INFO : Training Performance Summary
[MaskRCNN] INFO : # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ #
DLL 2020-12-23 18:29:06.774133 - : # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ #
DLL 2020-12-23 18:29:06.774190 - : Training Performance Summary
DLL 2020-12-23 18:29:06.774221 - : # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ #

DLL 2020-12-23 18:29:06.774259 - Average_throughput : 3.9 samples/sec
DLL 2020-12-23 18:29:06.774288 - Total processed steps : 5000
DLL 2020-12-23 18:29:06.774322 - Total_processing_time : 0h 43m 22s
[MaskRCNN] INFO : Average throughput: 3.9 samples/sec
[MaskRCNN] INFO : Total processed steps: 5000
[MaskRCNN] INFO : Total processing time: 0h 43m 22s
DLL 2020-12-23 18:29:06.774503 - : ==================== Metrics ====================
[MaskRCNN] INFO : ==================== Metrics ====================
[MaskRCNN] INFO : FastRCNN box loss: 0.32957
DLL 2020-12-23 18:29:06.774746 - FastRCNN box loss : 0.32957
[MaskRCNN] INFO : FastRCNN class loss: 0.24948
DLL 2020-12-23 18:29:06.774847 - FastRCNN class loss : 0.24948
[MaskRCNN] INFO : FastRCNN total loss: 0.57905
DLL 2020-12-23 18:29:06.774965 - FastRCNN total loss : 0.57905
[MaskRCNN] INFO : L2 loss: 0.44329
DLL 2020-12-23 18:29:06.775067 - L2 loss : 0.44329
[MaskRCNN] INFO : Learning rate: 0.005
DLL 2020-12-23 18:29:06.775198 - Learning rate : 0.005
[MaskRCNN] INFO : Mask loss: 0.47228
DLL 2020-12-23 18:29:06.775300 - Mask loss : 0.47228
[MaskRCNN] INFO : RPN box loss: 0.04858
DLL 2020-12-23 18:29:06.775408 - RPN box loss : 0.04858
[MaskRCNN] INFO : RPN score loss: 0.04
DLL 2020-12-23 18:29:06.775518 - RPN score loss : 0.04
[MaskRCNN] INFO : RPN total loss: 0.08858
DLL 2020-12-23 18:29:06.775617 - RPN total loss : 0.08858
[MaskRCNN] INFO : Total loss: 1.5832
DLL 2020-12-23 18:29:06.775723 - Total loss : 1.5832

[MaskRCNN] ERROR : Job finished with an uncaught exception: FAILURE

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun.real detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[19245,1],0]
Exit code: 1

According to the log, there is OOM when you train with two 1080 cards.
Suggest trying:

  1. reduce the bs: try train_batch_size: 1
  2. or train a smaller network. Note that please set width/height to multiples of 64.
    For example, set image_size: “(640, 1024)”
  3. if possible, try other cards, for example, v100