Hi AastaLLL,
I trained a Mask RCNN Model with Mobilenet V1 as Backbone. I am able to run it without GPU with CUDA_VISIBLE_DEVICES set to ‘-1’ on the Jetson Tx2.
my session config looks like this:
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth=True
But on GPU it crashes with the following ERROR log:
2018-05-28 09:31:39.483435: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-05-28 09:31:39.643266: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-05-28 09:31:39.807578: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.13GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-05-28 09:31:39.847436: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-05-28 09:31:40.529444: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.91GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-05-28 09:31:42.243801: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.32GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
> FPS: 0.0
2018-05-28 09:31:44.285335: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at where_op.cc:331 : Internal: WhereOp: Could not launch cub::DeviceSelect::Flagged to copy indices out, status: too many resources requested for launch
Traceback (most recent call last):
File "run_objectdetection.py", line 204, in <module>
detection(model)
File "run_objectdetection.py", line 140, in detection
output_dict = sess.run(tensor_dict, feed_dict={image_tensor: vs.expanded()})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceSelect::Flagged to copy indices out, status: too many resources requested for launch
[[Node: ClipToWindow/Where = Where[T=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:GPU:0"](ClipToWindow/Greater)]]
[[Node: BatchMultiClassNonMaxSuppression_1/map/while/Identity/_159 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1167_BatchMultiClassNonMaxSuppression_1/map/while/Identity", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopBatchMultiClassNonMaxSuppression_1/map/while/TensorArrayReadV3_4/_31)]]
Caused by op u'ClipToWindow/Where', defined at:
File "run_objectdetection.py", line 203, in <module>
SPLIT_MODEL, SSD_SHAPE).prepare_od_model()
File "/home/nvidia/realtime_object_detection/stuff/helper.py", line 177, in prepare_od_model
self.load_frozenmodel()
File "/home/nvidia/realtime_object_detection/stuff/helper.py", line 157, in load_frozenmodel
tf.import_graph_def(od_graph_def, name='')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py", line 513, in import_graph_def
_ProcessNewOps(graph)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py", line 303, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3540, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3428, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InternalError (see above for traceback): WhereOp: Could not launch cub::DeviceSelect::Flagged to copy indices out, status: too many resources requested for launch
[[Node: ClipToWindow/Where = Where[T=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:GPU:0"](ClipToWindow/Greater)]]
[[Node: BatchMultiClassNonMaxSuppression_1/map/while/Identity/_159 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1167_BatchMultiClassNonMaxSuppression_1/map/while/Identity", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopBatchMultiClassNonMaxSuppression_1/map/while/TensorArrayReadV3_4/_31)]]
This i don’t understand as the memory is shared between GPU and CPU. Has CPU the possibility to allocate more memory than GPU? is it possible to adjust /change this behavior?