Given there is .engine file & h5, how to incorporate it into Deepstream?

If you don’t know what your engine does, e,g, num-detected-classes, which info does not include in engine file.
I don’t know how we can help you

you could point out to full redundant list of properties required for Keras model as to a reference, probably?
Do you have any Keras reference with full list of parameters?
so that I wil try to retriev from the supplier the model properties according to the list?
will that work? likely? unlikely? highly unlikely? I anticipate that DeepStream is agnostic to model parameters as long as they are supplied, but how to determine which of them exacly are need to be supplied?

If I have python wrapper for running the .engine model file with TensorRT.
Could these missed parameters be retrieved from there? or they are not present for trt runtime execution? but required by Deepstream? why DeepStream would require parameters that TensorRT runtime doesn’t require?
@mchi ?

many informantion are about post-processing if you provide TRT engine to DeepStream.
Now sure what post-processing your model requires, so not sure if DeepStream supports it already.

Can you provide more info about your model, otherwise, it’s hard to say anything likely, unlikely… etc ?
Or, DS doc has explained the meaning and usage of the nvinfer parameters, you could go through by cross-checking your models.

one parameter has been retrieved from the model supplier:
num_classes is like =1
it seems there are still many parameters to find out?

yes, you need to get them outside of the model itself

@mchi
Will it be easier to incorporate into DeeppStream the following solution? Given the sources are provided as is in the documentation below? Would you be able to help with such implementation?

I think it’s feasible. I think the steps should be:

  1. convert the pytorch model to onnx
  2. configure the gie config with the onnx, and implement the detection post processing
  3. inference with video or image as input for DeepStream

Attempt 1.
Step 1.
Downloading the dataset.

wget https://github.com/javathunderman/retinopathy-dataset/archive/master.zip

Downloading the frozen graph:

wget https://storage.googleapis.com/download.tensorflow.org/models/inception_dec_2015.zip
 unzip inception_dec_2015.zip 
Archive:  inception_dec_2015.zip
  inflating: imagenet_comp_graph_label_strings.txt  
  inflating: LICENSE                 
  inflating: tensorflow_inception_graph.pb  

using environment of DeepStream 5

c4e41ec4dce6        nvcr.io/nvidia/deepstream-l4t:5.0-dp-20.04-samples 
docker start c4e41ec4dce6
docker exec -it c4e41ec4dce6 bash
:/opt/nvidia/deepstream/deepstream-5.0# 

:/import# python3 -m tf2onnx.convert --input tensorflow_inception_graph.pb  --output tensorflow_inception_graph.onnx
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.6/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/usr/local/lib/python3.6/dist-packages/tf2onnx/__init__.py", line 14, in <module>
    from . import verbose_logging as logging
  File "/usr/local/lib/python3.6/dist-packages/tf2onnx/verbose_logging.py", line 14, in <module>
    import tensorflow as tf
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

further it got into cusolver issue that belong to read only system thus can not be adjusted as in the example above with symlink.
Trying different container from NGX ML-tensorflow

/import# python3 -m tf2onnx.convert --input tensorflow_inception_graph.pb  --output tensorflow_inception_graph.onnx
2020-09-03 10:48:07.690127: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-09-03 10:48:15.147892: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-09-03 10:48:15.153429: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 10:48:15.153618: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.109GHz coreCount: 6 deviceMemorySize: 7.59GiB deviceMemoryBandwidth: 66.10GiB/s
2020-09-03 10:48:15.153712: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-09-03 10:48:15.224170: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-03 10:48:15.305383: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-03 10:48:15.417929: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-03 10:48:15.554397: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-03 10:48:15.631347: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-03 10:48:15.632692: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-09-03 10:48:15.633105: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 10:48:15.633556: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 10:48:15.633783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-09-03 10:48:15.669545: W tensorflow/core/platform/profile_utils/cpu_utils.cc:106] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2020-09-03 10:48:15.670181: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4f27350 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-03 10:48:15.670261: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-09-03 10:48:15.835514: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 10:48:15.836804: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5093da0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-03 10:48:15.836950: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Xavier, Compute Capability 7.2
2020-09-03 10:48:15.863081: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 10:48:15.863471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.109GHz coreCount: 6 deviceMemorySize: 7.59GiB deviceMemoryBandwidth: 66.10GiB/s
2020-09-03 10:48:15.863658: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-09-03 10:48:15.864142: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-03 10:48:15.864439: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-03 10:48:15.864609: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-03 10:48:15.864705: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-03 10:48:15.864817: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-03 10:48:15.864893: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-09-03 10:48:15.865351: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 10:48:15.865715: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 10:48:15.865973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-09-03 10:48:15.866410: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-09-03 10:48:22.266070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-03 10:48:22.266233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-09-03 10:48:22.266291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-09-03 10:48:22.266850: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 10:48:22.267934: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 10:48:22.268273: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2525 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2020-09-03 10:48:23.903300: W tensorflow/core/framework/op_def_util.cc:371] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/tf2onnx/convert.py", line 171, in <module>
    main()
  File "/usr/local/lib/python3.6/dist-packages/tf2onnx/convert.py", line 125, in main
    graph_def, inputs, outputs = tf_loader.from_graphdef(args.graphdef, args.inputs, args.outputs)
  File "/usr/local/lib/python3.6/dist-packages/tf2onnx/tf_loader.py", line 150, in from_graphdef
    frozen_graph = freeze_session(sess, input_names=input_names, output_names=output_names)
  File "/usr/local/lib/python3.6/dist-packages/tf2onnx/tf_loader.py", line 113, in freeze_session
    output_node_names = [i.split(':')[:-1][0] for i in output_names]
TypeError: 'NoneType' object is not iterable

Probably I should try with tensorflow 1.4?
Obviously first attemppt to us the instruction to install specific version 1.4.0 of tensoflow fails using

 pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v44 tensorflow==1.4.0+nv20.08         
Looking in indexes: https://pypi.org/simple, https://developer.download.nvidia.com/compute/redist/jp/v44
ERROR: Could not find a version that satisfies the requirement tensorflow==1.4.0+nv20.08 (from versions: 1.15.2+nv20.4, 1.15.2+nv20.6, 1.15.3+nv20.7, 1.15.3+nv20.8, 2.1.0+nv20.4, 2.2.0+nv20.6, 2.2.0+nv20.7, 2.2.0+nv20.8)
ERROR: No matching distribution found for tensorflow==1.4.0+nv20.08

Since it doesn’t appear possible to get the version 1.4.0, I have to use

$ sudo pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v44 ‘tensorflow<2’

Hi @Andrey1984,
This is a new topic thay has nothing to do with original issue, please file a new topic.
And, if the original quetions you asked have been addressed, could you mark it closed?

BTW, for the onnx conversion, only pb is not enough

original concerns have’t been implemented
there is still a need to import the model or get it executed at least with tensorRT within DeepStream.
However, it seems that the subtask of implementing the intel scenario, if it works will be applicable to the original issue.
@mchi will you be able to assistt with implementing the intel scenario at the separate topic here:
which steps / components need to be added in order for the onnx conversion to get through?
new thread Implementing DeepStream/ TRT integration by Intels scenario

what concern?

the concern to get a model, similar to the one by Intel, to integrate into DeedStream using TRT.
As the Intel provides open source & full steps definition it seems to make sense to try getting the integration done with it, to see if it works given redundant sources are provided for the intel scenario including the dataset images.

You can refer to https://software.intel.com/content/www/us/en/develop/articles/detecting-diabetic-retinopathy-using-deep-learning-on-intel-architecture.html to train the model with the dataset, or just use the model it provides.
Providing model is out of DeepStream support.

once the model is trained/ provided from the mentioned article;
the integration of the model into the DeepStream will be or will not be out of the DeepStream Support?

Depends on what the issue is.
For an example, if user customize a model which needs a customized post-processing, user should implement it by himself since DeepStream provides the inference for the post-processing.

in given scenario it is image classification model;
that just predicts if the image have the disease or not with some probability
it doesn’t imply post processing, does it?
Moreover, it seems that the model is produced by applying to the dataset the algorithm as follows:

python retrain.py \
  --bottleneck_dir=bottlenecks \
  --how_many_training_steps=300 \
  --model_dir=inception \
  --output_graph=retrained_graph.pb \
  --output_labels=retrained_labels.txt \
  --image_dir=<>

the code above seems agnostic to post processing

if so, I think it should be fine. so, no cercen, right?

after digging deeper into the Intel article it turned out that it misses many puzzles;
However, as it has dataset sources it will be just possible train a model with google AI interface.
After uploading datasets it will become visible which options they would suport for exporting the model.

1 Like

However, following the Intels article: attempt #1.

git clone https://github.com/javathunderman/diabetic-retinopathy-screening
cd diabetic-retinopathy-screening/
git clone https://github.com/Nomikxyz/retinopathy-dataset
mkdir images
cd images
mkdir diseased
mkdir nondiseased
cd ..

then copy ~250 files from folder retinopathy-dataset sympthoms to the diseased folder & ~250 images from folder nosymppthoms to non diseased folder

running the retrain script as per the Inttel’s tutorial:

 python3 retrain.py   --bottleneck_dir=bottlenecks   --how_many_training_steps=300   --model_dir=inception   --output_graph=retrained_graph.pb   --output_labels=retrained_labels.txt   --image_dir=images/
2020-09-03 21:31:50.740715: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From retrain.py:1063: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

WARNING:tensorflow:From retrain.py:773: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.

W0903 21:31:57.186100 548329693200 module_wrapper.py:139] From retrain.py:773: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.

WARNING:tensorflow:From retrain.py:774: The name tf.gfile.DeleteRecursively is deprecated. Please use tf.io.gfile.rmtree instead.

W0903 21:31:57.186951 548329693200 module_wrapper.py:139] From retrain.py:774: The name tf.gfile.DeleteRecursively is deprecated. Please use tf.io.gfile.rmtree instead.

WARNING:tensorflow:From retrain.py:775: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

W0903 21:31:57.189463 548329693200 module_wrapper.py:139] From retrain.py:775: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

WARNING:tensorflow:From retrain.py:248: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

W0903 21:32:00.193557 548329693200 module_wrapper.py:139] From retrain.py:248: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2020-09-03 21:32:00.575117: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-09-03 21:32:00.680931: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 21:32:00.681140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.109
pciBusID: 0000:00:00.0
2020-09-03 21:32:00.681223: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-09-03 21:32:00.806538: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-03 21:32:00.927444: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-03 21:32:01.063370: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-03 21:32:01.133752: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-03 21:32:01.194147: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-03 21:32:01.248698: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-09-03 21:32:01.250449: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 21:32:01.252056: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 21:32:01.252207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-09-03 21:32:01.279084: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2020-09-03 21:32:01.279771: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3bd50110 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-03 21:32:01.280047: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-09-03 21:32:01.370448: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 21:32:01.371520: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3bda7c70 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-03 21:32:01.371653: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Xavier, Compute Capability 7.2
2020-09-03 21:32:01.372743: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 21:32:01.372967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.109
pciBusID: 0000:00:00.0
2020-09-03 21:32:01.373225: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-09-03 21:32:01.373406: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-03 21:32:01.373497: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-03 21:32:01.373560: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-03 21:32:01.373710: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-03 21:32:01.373840: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-03 21:32:01.374012: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-09-03 21:32:01.374204: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 21:32:01.374432: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 21:32:01.374519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-09-03 21:32:01.374623: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-09-03 21:32:03.030578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-03 21:32:03.030881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181]      0 
2020-09-03 21:32:03.030990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0:   N 
2020-09-03 21:32:03.031640: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 21:32:03.032137: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 21:32:03.032520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 261 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
WARNING:tensorflow:From retrain.py:252: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

W0903 21:32:03.078023 548329693200 module_wrapper.py:139] From retrain.py:252: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

2020-09-03 21:32:07.218916: W tensorflow/core/framework/op_def_util.cc:357] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
Looking for images in 'diseased'
Looking for images in 'nondiseased'
2020-09-03 21:32:08.310473: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 21:32:08.317415: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.109
pciBusID: 0000:00:00.0
2020-09-03 21:32:08.410976: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-09-03 21:32:08.463758: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-09-03 21:32:08.463979: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-03 21:32:08.476783: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-03 21:32:08.500267: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-03 21:32:08.523746: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-03 21:32:08.547276: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-09-03 21:32:08.547591: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 21:32:08.548069: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 21:32:08.548238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1762] Adding visible gpu devices: 0
2020-09-03 21:32:08.548783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1175] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-03 21:32:08.548841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181]      0 
2020-09-03 21:32:08.549917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] 0:   N 
2020-09-03 21:32:08.550348: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 21:32:08.550734: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-09-03 21:32:08.551027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1320] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 261 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
Creating bottleneck at bottlenecks/diseased/13638_left.jpeg.txt
2020-09-03 21:32:55.810282: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-09-03 21:33:55.315417: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:33:56.602337: W tensorflow/stream_executor/cuda/ptxas_utils.cc:83] Couldn't invoke /usr/local/cuda/bin/ptxas --version
2020-09-03 21:33:59.154034: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:34:01.168701: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Failed to launch ptxas
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-09-03 21:34:32.605687: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:34:36.945777: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:34:37.705258: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:34:44.310620: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:34:46.359441: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:34:46.717019: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:34:46.723566: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:34:46.803360: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:34:46.835004: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:34:46.873237: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:34:46.879784: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:34:50.410013: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:35:00.258402: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:36:59.440165: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:37:29.503268: E tensorflow/core/platform/posix/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-09-03 21:37:40.754483: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
Killed


After adding 8gb swap file

an addition to existent zram swap the situation seem improved & training started

Following Google AI alternative procedure:
Attempt #1:
from uploaded dataset trainig has started