Using Jetson tx2 and Jetpack 4.3, python 3.5.2, tensorflow 1.9.0
Hello,
when I run SSD inference on a 300*300 picture on CPU with configure:
config = tf.ConfigProto(device_count = {“GPU”:0})
It will consume 2.89secs and give me the result
But when I do the inference on GPU, an error will occur:
2019-05-17 10:46:40.312189: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.54GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-05-17 10:46:41.100993: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-05-17 10:46:41.158767: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.37GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-05-17 10:46:41.343459: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.18GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-05-17 10:46:41.429324: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.06GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-05-17 10:46:41.508449: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.13GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-05-17 10:46:41.650280: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at where_op.cc:286 : Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true / nonzero indices. temp_storage_bytes: 767, status: too many resources requested for launch
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true / nonzero indices. temp_storage_bytes: 767, status: too many resources requested for launch
[[Node: decoded_predictions/loop_over_batch/while/loop_over_classes/while/boolean_mask/Where = Where[T=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoded_predictions/loop_over_batch/while/loop_over_classes/while/boolean_mask/Reshape_1)]]
[[Node: decoded_predictions/loop_over_batch/while/loop_over_classes/while/cond/strided_slice/_209 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1016_...ided_slice", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopdecoded_predictions/loop_over_batch/while/loop_over_classes/while/TensorArrayReadV3/_158)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test_SSD.py", line 41, in <module>
y_pred = sess.run([ssd_output], feed_dict = {ssd_input: image_resized})
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true / nonzero indices. temp_storage_bytes: 767, status: too many resources requested for launch
[[Node: decoded_predictions/loop_over_batch/while/loop_over_classes/while/boolean_mask/Where = Where[T=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoded_predictions/loop_over_batch/while/loop_over_classes/while/boolean_mask/Reshape_1)]]
[[Node: decoded_predictions/loop_over_batch/while/loop_over_classes/while/cond/strided_slice/_209 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1016_...ided_slice", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopdecoded_predictions/loop_over_batch/while/loop_over_classes/while/TensorArrayReadV3/_158)]]
Caused by op 'decoded_predictions/loop_over_batch/while/loop_over_classes/while/boolean_mask/Where', defined at:
File "test_SSD.py", line 9, in <module>
saver = tf.train.import_meta_graph('./ssd_ckpt/ssd300.ckpt.meta')
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1960, in import_meta_graph
**kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/meta_graph.py", line 744, in import_scoped_meta_graph
producer_op_list=producer_op_list)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
_ProcessNewOps(graph)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 234, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3563, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3563, in <listcomp>
for c_op in c_api_util.new_tf_operations(self)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3450, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1740, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InternalError (see above for traceback): WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true / nonzero indices. temp_storage_bytes: 767, status: too many resources requested for launch
[[Node: decoded_predictions/loop_over_batch/while/loop_over_classes/while/boolean_mask/Where = Where[T=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoded_predictions/loop_over_batch/while/loop_over_classes/while/boolean_mask/Reshape_1)]]
[[Node: decoded_predictions/loop_over_batch/while/loop_over_classes/while/cond/strided_slice/_209 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1016_...ided_slice", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopdecoded_predictions/loop_over_batch/while/loop_over_classes/while/TensorArrayReadV3/_158)]]
Why is this happen?