L4T 32.2.0
Ubuntu 18.04.3
Kernel Version: 4.9.140-tegra
Python: 3.6.8
CUDA 10.0.326
Xavier PWR Mode: MAXN
Tensorflow: v1.14.0
Model: DeepLab
model_test.py is successful, but local_test.sh fails running the following
(from dir tensorflow/models/research/deeplab):
$ sh local_test.sh
Error:
2019-08-27 17:02:47.700148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 120 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2019-08-27 17:03:05.437609: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4B (rounded to 256). Current allocation summary follows.
2019-08-27 17:03:05.437756: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (256): Total Chunks: 96, Chunks in use: 96. 24.0KiB allocated for chunks. 24.0KiB in use in bin. 384B client-requested in use in bin.
2019-08-27 17:03:05.437818: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-08-27 17:03:05.437868: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (1024): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
Eventually followed by:
2019-08-27 17:03:05.439940: I tensorflow/core/common_runtime/bfc_allocator.cc:780] Bin for 256B was 256B, Chunk State:
2019-08-27 17:03:05.440025: I tensorflow/core/common_runtime/bfc_allocator.cc:793] Next region of size 125837312
2019-08-27 17:03:05.440110: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x209e99000 next 1 of size 256
2019-08-27 17:03:05.440194: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x209e99100 next 2 of size 3072
2019-08-27 17:03:05.440277: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x209e99d00 next 3 of size 3072
Eventually followed by:
2019-08-27 17:03:05.461770: I tensorflow/core/common_runtime/bfc_allocator.cc:809] Summary of in-use Chunks by size:
2019-08-27 17:03:05.461800: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 96 Chunks of size 256 totalling 24.0KiB
2019-08-27 17:03:05.461820: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 480 Chunks of size 3072 totalling 1.41MiB
2019-08-27 17:03:05.461841: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 48 Chunks of size 26368 totalling 1.21MiB
2019-08-27 17:03:05.461861: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 47 Chunks of size 2119936 totalling 95.02MiB
2019-08-27 17:03:05.461880: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 23435520 totalling 22.35MiB
2019-08-27 17:03:05.461936: I tensorflow/core/common_runtime/bfc_allocator.cc:816] Sum Total of in-use chunks: 120.01MiB
2019-08-27 17:03:05.461963: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 125837312 memory_limit_: 125837312 available bytes: 0 curr_region_allocation_bytes_: 251674624
2019-08-27 17:03:05.462013: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats:
Limit: 125837312
InUse: 125837312
MaxInUse: 125837312
NumAllocs: 672
MaxAllocSize: 23435520
2019-08-27 17:03:05.462086: W tensorflow/core/common_runtime/bfc_allocator.cc:319] ********************************************************************************************xxxxxxxx
2019-08-27 17:03:05.462150: W tensorflow/core/framework/op_kernel.cc:1479] OP_REQUIRES failed at constant_op.cc:77 : Resource exhausted: OOM when allocating tensor of shape [] and type float
2019-08-27 17:03:05.462213: E tensorflow/core/common_runtime/executor.cc:648] Executor failed to create kernel. Resource exhausted: OOM when allocating tensor of shape [] and type float
[[{{node xception_65/exit_flow/block2/unit_1/xception_module/separable_conv3_pointwise/weights/Initializer/truncated_normal/stddev}}]]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor of shape [] and type float
[[{{node xception_65/exit_flow/block2/unit_1/xception_module/separable_conv3_pointwise/weights/Initializer/truncated_normal/stddev}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/NVMe500/workspace/models/research/deeplab/train.py", line 513, in <module>
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/mnt/NVMe500/workspace/models/research/deeplab/train.py", line 505, in main
hooks=[stop_hook]) as sess:
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 584, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1007, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 725, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1200, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1205, in _create_session
return self._sess_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 871, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 647, in create_session
init_fn=self._scaffold.init_fn)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/session_manager.py", line 296, in prepare_session
sess.run(init_op, feed_dict=init_feed_dict)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor of shape [] and type float
[[node xception_65/exit_flow/block2/unit_1/xception_module/separable_conv3_pointwise/weights/Initializer/truncated_normal/stddev (defined at mnt/NVMe500/workspace/models/research/deeplab/core/xception.py:180) ]]
Original stack trace for 'xception_65/exit_flow/block2/unit_1/xception_module/separable_conv3_pointwise/weights/Initializer/truncated_normal/stddev':
File "mnt/NVMe500/workspace/models/research/deeplab/train.py", line 513, in <module>
tf.app.run()
Am new to TF, would appreciate any advice on how to resolve the error.
Thank you in advance!