resource exhausted error using tensorflow on jetson nano

I successfully installed jupyter, scipy, matplotlib and all the dependencies necessary to run the notebooks here:

Somehow, I cannot believe I am using the gpu on the nano board because things go quite slowly. However, tegrastats shows blips of 99% GRD utilization, although CPU usage is more consistent.

When I load up lesson 14, deepdream.py, line 24 begins to execute

img_result = recursive_optimize(layer_tensor=layer_tensor, image=image,
                 num_iterations=10, step_size=3.0, rescale_factor=0.7,
                 num_repeats=4, blend=0.2)

but after recursive level 4 I get this error

Processing image: 
---------------------------------------------------------------------------
ResourceExhaustedError                    Traceback (most recent call last)
~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1333     try:
-> 1334       return fn(*args)
   1335     except errors.OpError as e:

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1318       return self._call_tf_sessionrun(
-> 1319           options, feed_dict, fetch_list, target_list, run_metadata)
   1320 

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1406         self._session, options, feed_dict, fetch_list, target_list,
-> 1407         run_metadata)
   1408 

ResourceExhaustedError: OOM when allocating tensor with shape[1,77,120,192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node gradients_6/Square_6_grad/Mul}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[{{node gradients_6/conv2d0_pre_relu/conv_grad/Conv2DBackpropInput}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


During handling of the above exception, another exception occurred:

ResourceExhaustedError                    Traceback (most recent call last)
<ipython-input-24-ab4fa17c6690> in <module>
      1 img_result = recursive_optimize(layer_tensor=layer_tensor, image=image,
      2                  num_iterations=10, step_size=3.0, rescale_factor=0.7,
----> 3                  num_repeats=4, blend=0.2)

<ipython-input-17-f3924ae5544f> in recursive_optimize(layer_tensor, image, num_repeats, rescale_factor, blend, num_iterations, step_size, tile_size)
     57                                 num_iterations=num_iterations,
     58                                 step_size=step_size,
---> 59                                 tile_size=tile_size)
     60 
     61     return img_result

<ipython-input-16-d6efbaa95b77> in optimize_image(layer_tensor, image, num_iterations, step_size, tile_size, show_gradient)
     35         # maximize the mean of the given layer-tensor.
     36         grad = tiled_gradient(gradient=gradient, image=img,
---> 37                               tile_size=tile_size)
     38 
     39         # Blur the gradient with different amounts and add

<ipython-input-15-295b93b138cd> in tiled_gradient(gradient, image, tile_size)
     50 
     51             # Use TensorFlow to calculate the gradient-value.
---> 52             g = session.run(gradient, feed_dict=feed_dict)
     53 
     54             # Normalize the gradient for the tile. This is

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    927     try:
    928       result = self._run(None, fetches, feed_dict, options_ptr,
--> 929                          run_metadata_ptr)
    930       if run_metadata:
    931         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1150     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1151       results = self._do_run(handle, final_targets, final_fetches,
-> 1152                              feed_dict_tensor, options, run_metadata)
   1153     else:
   1154       results = []

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1326     if handle is None:
   1327       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1328                            run_metadata)
   1329     else:
   1330       return self._do_call(_prun_fn, handle, feeds, fetches)

~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1346           pass
   1347       message = error_interpolation.interpolate(message, self._graph)
-> 1348       raise type(e)(node_def, op, message)
   1349 
   1350   def _extend_graph(self):

ResourceExhaustedError: OOM when allocating tensor with shape[1,77,120,192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node gradients_6/Square_6_grad/Mul (defined at /home/stefan/Documents/TensorFlow-Tutorials-master/inception5h.py:167) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[node gradients_6/conv2d0_pre_relu/conv_grad/Conv2DBackpropInput (defined at /home/stefan/Documents/TensorFlow-Tutorials-master/inception5h.py:167) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


Caused by op 'gradients_6/Square_6_grad/Mul', defined at:
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/stefan/.local/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/stefan/.local/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/stefan/.local/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 505, in start
    self.io_loop.start()
  File "/home/stefan/.local/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 148, in start
    self.asyncio_loop.run_forever()
  File "/usr/lib/python3.6/asyncio/base_events.py", line 427, in run_forever
    self._run_once()
  File "/usr/lib/python3.6/asyncio/base_events.py", line 1440, in _run_once
    handle._run()
  File "/usr/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/home/stefan/.local/lib/python3.6/site-packages/tornado/ioloop.py", line 690, in <lambda>
    lambda f: self._run_callback(functools.partial(callback, future))
  File "/home/stefan/.local/lib/python3.6/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/home/stefan/.local/lib/python3.6/site-packages/tornado/gen.py", line 781, in inner
    self.run()
  File "/home/stefan/.local/lib/python3.6/site-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.send(value)
  File "/home/stefan/.local/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 357, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/home/stefan/.local/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/home/stefan/.local/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 267, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/home/stefan/.local/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/home/stefan/.local/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 534, in execute_request
    user_expressions, allow_stdin,
  File "/home/stefan/.local/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/home/stefan/.local/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 294, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/stefan/.local/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/stefan/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2848, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/home/stefan/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2874, in _run_cell
    return runner(coro)
  File "/home/stefan/.local/lib/python3.6/site-packages/IPython/core/async_helpers.py", line 67, in _pseudo_sync_runner
    coro.send(None)
  File "/home/stefan/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3049, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/stefan/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3214, in run_ast_nodes
    if (yield from self.run_code(code, result)):
  File "/home/stefan/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3296, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-24-ab4fa17c6690>", line 3, in <module>
    num_repeats=4, blend=0.2)
  File "<ipython-input-17-f3924ae5544f>", line 59, in recursive_optimize
    tile_size=tile_size)
  File "<ipython-input-16-d6efbaa95b77>", line 30, in optimize_image
    gradient = model.get_gradient(layer_tensor)
  File "/home/stefan/Documents/TensorFlow-Tutorials-master/inception5h.py", line 167, in get_gradient
    gradient = tf.gradients(tensor_mean, self.input)[0]
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 664, in gradients
    unconnected_gradients)
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 965, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 420, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 965, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py", line 444, in _SquareGrad
    return math_ops.multiply(grad, math_ops.multiply(x, y))
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 248, in multiply
    return gen_math_ops.mul(x, y, name)
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5860, in mul
    "Mul", x=x, y=y, name=name)
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

...which was originally created as op 'Square_6', defined at:
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
[elided 28 identical lines from previous traceback]
  File "<ipython-input-16-d6efbaa95b77>", line 30, in optimize_image
    gradient = model.get_gradient(layer_tensor)
  File "/home/stefan/Documents/TensorFlow-Tutorials-master/inception5h.py", line 159, in get_gradient
    tensor = tf.square(tensor)
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 9389, in square
    "Square", x=x, name=name)
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/stefan/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,77,120,192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node gradients_6/Square_6_grad/Mul (defined at /home/stefan/Documents/TensorFlow-Tutorials-master/inception5h.py:167) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[node gradients_6/conv2d0_pre_relu/conv_grad/Conv2DBackpropInput (defined at /home/stefan/Documents/TensorFlow-Tutorials-master/inception5h.py:167) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Is this an issue with the tensorflow implementation or am I just running out of ram?

Hi,

You are running out of the memory:

ResourceExhaustedError (see above for traceback): <b>OOM when allocating tensor with shape[1,77,120,192]</b> and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

Please noticed that Nano only has 4G memory.
Please check your memory usage with tegrastats first:

sudo tegrastats

Thanks.

I have the same problem. Is there any way to use less memory with TensorFlow?

I solved the problem by disabling the GUI

https://devtalk.nvidia.com/default/topic/1049266/jetson-nano/headless-os/

Hello, I have some question about training object detection API tensorflow 2.0 on jetson nano. I try to install and training, data from .XML file convert to csv and then recornd. I don’t know why I always get LOSS = nan every 100 steps.

model : ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8
config file :

model {
ssd {
num_classes: 4
image_resizer {
fixed_shape_resizer {
height: 320
width: 320
}
}
feature_extractor {
type: “ssd_mobilenet_v2_fpn_keras”
depth_multiplier: 1.0
min_depth: 16
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 3.9999998989515007e-05
}
}
initializer {
random_normal_initializer {
mean: 0.0
stddev: 0.009999999776482582
}
}
activation: RELU_6
batch_norm {
decay: 0.996999979019165
scale: true
epsilon: 0.0010000000474974513
}
}
use_depthwise: true
override_base_feature_extractor_hyperparams: true
fpn {
min_level: 3
max_level: 7
additional_layer_depth: 128
}
}
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
box_predictor {
weight_shared_convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 3.9999998989515007e-05
}
}
initializer {
random_normal_initializer {
mean: 0.0
stddev: 0.009999999776482582
}
}
activation: RELU_6
batch_norm {
decay: 0.996999979019165
scale: true
epsilon: 0.0010000000474974513
}
}
depth: 128
num_layers_before_predictor: 4
kernel_size: 3
class_prediction_bias_init: -4.599999904632568
share_prediction_tower: true
use_depthwise: true
}
}
anchor_generator {
multiscale_anchor_generator {
min_level: 3
max_level: 7
anchor_scale: 4.0
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
scales_per_octave: 2
}
}
post_processing {
batch_non_max_suppression {
score_threshold: 9.99999993922529e-09
iou_threshold: 0.6000000238418579
max_detections_per_class: 100
max_total_detections: 100
use_static_shapes: false
}
score_converter: SIGMOID
}
normalize_loss_by_num_matches: true
loss {
localization_loss {
weighted_smooth_l1 {
}
}
classification_loss {
weighted_sigmoid_focal {
gamma: 2.0
alpha: 0.25
}
}
classification_weight: 1.0
localization_weight: 1.0
}
encode_background_as_zeros: true
normalize_loc_loss_by_codesize: true
inplace_batchnorm_update: true
freeze_batchnorm: false
}
}
train_config {
batch_size: 1
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_crop_image {
min_object_covered: 0.0
min_aspect_ratio: 0.75
max_aspect_ratio: 3.0
min_area: 0.75
max_area: 1.0
overlap_thresh: 0.0
}
}
sync_replicas: true
optimizer {
momentum_optimizer {
learning_rate {
cosine_decay_learning_rate {
learning_rate_base: 0.07999999821186066
total_steps: 50000
warmup_learning_rate: 0.026666000485420227
warmup_steps: 1000
}
}
momentum_optimizer_value: 0.8999999761581421
}
use_moving_average: false
}
fine_tune_checkpoint: “/home/phanith/Desktop/object_detection/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/checkpoint/ckpt-0”
num_steps: 50000
startup_delay_steps: 0.0
replicas_to_aggregate: 8
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
fine_tune_checkpoint_type: “detection”
fine_tune_checkpoint_version: V2
}
train_input_reader {
label_map_path: “/home/phanith/Desktop/object_detection/data/label_map1.pbtxt”
tf_record_input_reader {
input_path: “/home/phanith/Desktop/object_detection/data/train.record”
}
}
eval_config {
metrics_set: “coco_detection_metrics”
use_moving_averages: false
}
eval_input_reader {
label_map_path: “/home/phanith/Desktop/object_detection/data/label_map1.pbtxt”
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: “/home/phanith/Desktop/object_detection/data/test.record”
}
}