InternalError - temp_storage_bytes: 1 -----HELP!!!

devin.x.zhou · February 7, 2018, 12:01am

I am running Mask RCNN object detection, which I got the same internal error over two different size RCNN structures. The demo are in this repository :

I tried to train shapes (only circles, triangles, squares instead of regular 80 classes object detection) on my desktop PC, I got a h5 model which has much less backbone structure than the real object detection. Then I used this h5 file on Jetson TX2. However I still got internal error as below:

Question is: Should I expand my swap memory for GPU or CPU? Or should I install a 60GB SSD drive for the whole system? Maybe reduce the ROI detection to utilize fewer registers on CPU?

InternalError Traceback (most recent call last)
~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1326 try:
→ 1327 return fn(*args)
1328 except errors.OpError as e:

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1305 feed_dict, fetch_list, target_list,
→ 1306 status, run_metadata)
1307

/usr/lib/python3.5/contextlib.py in exit(self, type, value, traceback)
65 try:
—> 66 next(self.gen)
67 except StopIteration:

~/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py in raise_exception_on_not_ok_status()
465 compat.as_text(pywrap_tensorflow.TF_Message(status)),
→ 466 pywrap_tensorflow.TF_GetCode(status))
467 finally:

InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: too many resources requested for launch
[[Node: roi_align_classifier_1/Where_2 = Where_device=“/job:localhost/replica:0/task:0/gpu:0”]]
[[Node: roi_align_classifier_1/Cast_5/_5545 = _Recvclient_terminated=false, recv_device=“/job:localhost/replica:0/task:0/cpu:0”, send_device=“/job:localhost/replica:0/task:0/gpu:0”, send_device_incarnation=1, tensor_name=“edge_3201_roi_align_classifier_1/Cast_5”, tensor_type=DT_INT32, _device=“/job:localhost/replica:0/task:0/cpu:0”]]

During handling of the above exception, another exception occurred:

InternalError Traceback (most recent call last)
in ()
----> 1 results = model.detect([original_image], verbose=1)
2
3 r = results[0]
4 visualize.display_instances(original_image, r[‘rois’], r[‘masks’], r[‘class_ids’],
5 dataset_val.class_names, r[‘scores’], ax=get_ax())

~/Desktop/Mask_RCNN-master/model.py in detect(self, images, verbose)
2338 detections, mrcnn_class, mrcnn_bbox, mrcnn_mask,
2339 rois, rpn_class, rpn_bbox =
→ 2340 self.keras_model.predict([molded_images, image_metas], verbose=0)
2341 # Process detections
2342 results =

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in predict(self, x, batch_size, verbose, steps)
1798 f = self.predict_function
1799 return self._predict_loop(f, ins, batch_size=batch_size,
→ 1800 verbose=verbose, steps=steps)
1801
1802 def train_on_batch(self, x, y,

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in _predict_loop(self, f, ins, batch_size, verbose, steps)
1299 ins_batch[i] = ins_batch[i].toarray()
1300
→ 1301 batch_outs = f(ins_batch)
1302 if not isinstance(batch_outs, list):
1303 batch_outs = [batch_outs]

/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py in call(self, inputs)
2473 session = get_session()
2474 updated = session.run(fetches=fetches, feed_dict=feed_dict,
→ 2475 **self.session_kwargs)
2476 return updated[:len(self.outputs)]
2477

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
893 try:
894 result = self._run(None, fetches, feed_dict, options_ptr,
→ 895 run_metadata_ptr)
896 if run_metadata:
897 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1122 if final_fetches or final_targets or (handle and feed_dict_tensor):
1123 results = self._do_run(handle, final_targets, final_fetches,
→ 1124 feed_dict_tensor, options, run_metadata)
1125 else:
1126 results =

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1319 if handle is None:
1320 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
→ 1321 options, run_metadata)
1322 else:
1323 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1338 except KeyError:
1339 pass
→ 1340 raise type(e)(node_def, op, message)
1341
1342 def _extend_graph(self):

InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: too many resources requested for launch
[[Node: roi_align_classifier_1/Where_2 = Where_device=“/job:localhost/replica:0/task:0/gpu:0”]]
[[Node: roi_align_classifier_1/Cast_5/_5545 = _Recvclient_terminated=false, recv_device=“/job:localhost/replica:0/task:0/cpu:0”, send_device=“/job:localhost/replica:0/task:0/gpu:0”, send_device_incarnation=1, tensor_name=“edge_3201_roi_align_classifier_1/Cast_5”, tensor_type=DT_INT32, _device=“/job:localhost/replica:0/task:0/cpu:0”]]

Caused by op ‘roi_align_classifier_1/Where_2’, defined at:
File “/usr/lib/python3.5/runpy.py”, line 184, in _run_module_as_main
“main”, mod_spec)
File “/usr/lib/python3.5/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py”, line 16, in
app.launch_new_instance()
File “/usr/local/lib/python3.5/dist-packages/traitlets/config/application.py”, line 658, in launch_instance
app.start()
File “/usr/local/lib/python3.5/dist-packages/ipykernel/kernelapp.py”, line 478, in start
self.io_loop.start()
File “/usr/local/lib/python3.5/dist-packages/zmq/eventloop/ioloop.py”, line 177, in start
super(ZMQIOLoop, self).start()
File “/usr/local/lib/python3.5/dist-packages/tornado/ioloop.py”, line 888, in start
handler_func(fd_obj, events)
File “/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py”, line 277, in null_wrapper
return fn(*args, **kwargs)
File “/usr/local/lib/python3.5/dist-packages/zmq/eventloop/zmqstream.py”, line 440, in _handle_events
self._handle_recv()
File “/usr/local/lib/python3.5/dist-packages/zmq/eventloop/zmqstream.py”, line 472, in _handle_recv
self._run_callback(callback, msg)
File “/usr/local/lib/python3.5/dist-packages/zmq/eventloop/zmqstream.py”, line 414, in _run_callback
callback(*args, **kwargs)
File “/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py”, line 277, in null_wrapper
return fn(*args, **kwargs)
File “/usr/local/lib/python3.5/dist-packages/ipykernel/kernelbase.py”, line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File “/usr/local/lib/python3.5/dist-packages/ipykernel/kernelbase.py”, line 233, in dispatch_shell
handler(stream, idents, msg)
File “/usr/local/lib/python3.5/dist-packages/ipykernel/kernelbase.py”, line 399, in execute_request
user_expressions, allow_stdin)
File “/usr/local/lib/python3.5/dist-packages/ipykernel/ipkernel.py”, line 208, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File “/usr/local/lib/python3.5/dist-packages/ipykernel/zmqshell.py”, line 537, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File “/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py”, line 2728, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File “/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py”, line 2850, in run_ast_nodes
if self.run_code(code, result):
File “/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py”, line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File “”, line 15, in
model_dir=DIR)
File “/home/nvidia/Desktop/Mask_RCNN-master/model.py”, line 1735, in init
self.keras_model = self.build(mode=mode, config=config)
File “/home/nvidia/Desktop/Mask_RCNN-master/model.py”, line 1918, in build
config.POOL_SIZE, config.NUM_CLASSES)
File “/home/nvidia/Desktop/Mask_RCNN-master/model.py”, line 876, in fpn_classifier_graph
name=“roi_align_classifier”)([rois] + feature_maps)
File “/usr/local/lib/python3.5/dist-packages/keras/engine/topology.py”, line 617, in call
output = self.call(inputs, **kwargs)
File “/home/nvidia/Desktop/Mask_RCNN-master/model.py”, line 373, in call
ix = tf.where(tf.equal(roi_level, level))
File “/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py”, line 2365, in where
return gen_array_ops.where(input=condition, name=name)
File “/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py”, line 4053, in where
result = _op_def_lib.apply_op(“Where”, input=input, name=name)
File “/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py”, line 767, in apply_op
op_def=op_def)
File “/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File “/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InternalError (see above for traceback): WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: too many resources requested for launch
[[Node: roi_align_classifier_1/Where_2 = Where_device=“/job:localhost/replica:0/task:0/gpu:0”]]
[[Node: roi_align_classifier_1/Cast_5/_5545 = _Recvclient_terminated=false, recv_device=“/job:localhost/replica:0/task:0/cpu:0”, send_device=“/job:localhost/replica:0/task:0/gpu:0”, send_device_incarnation=1, tensor_name=“edge_3201_roi_align_classifier_1/Cast_5”, tensor_type=DT_INT32, _device=“/job:localhost/replica:0/task:0/cpu:0”]]

AastaLLL · February 7, 2018, 9:14am

Hi,

From the error log, you hit the out of resource issue on both case.
If you launch TensorFlow with GPU mode, adding swap space may not help since it’s not a GPU-accessible resource.

Our recommendation is to lower the memory usage of the launched model.
For example, reduce network input size, reduce layer amount or use a simpler model.

Thanks.

devin.x.zhou · February 13, 2018, 6:31am

After a lot of searching, I found that this link helps which uses much smaller neural network written in C++

https://github.com/dusty-nv/jetson-inference#locating-object-coordinates-using-detectnet