Create inference graph failed on Agx Xavier

Hi AastaLLL,

Please find the models on the google drive.

Plus, question about TRT_object_detection dependencies.

Got an error as run pip3 install numpy pycuda --user
In file included from src/cpp/cuda.cpp:4:0:
src/cpp/cuda.hpp:14:10: fatal error: cuda.h: No such file or directory
#include <cuda.h>
^~~~~~~~
compilation terminated.
error: command ‘aarch64-linux-gnu-gcc’ failed with exit status 1

ERROR: Failed building wheel for pycuda

after
https://developer.download.nvidia.com/compute/redist/jp/v43 tensorflow==1.15.2+nv20.3 – user

about tensorrt sample:

Is there a way to transfer a TF model to a frozen_inference_graph.pb?

Thank you for any advice,

Hi,

1.

Please install TensorRT python package from SDKmanager.
If you want to do it manually, please install the python-based package as following:

python3-libnvinfer_6.0.1-1+cuda10.0_arm64.deb
python3-libnvinfer-dev_6.0.1-1+cuda10.0_arm64.deb

2. pyCUDA error:

Please check this issue for information:

It looks like your device doesn’t install all the required package. Ex. CUDA, cuDNN and TensorRT.
It’s recommended to check if you have installed the “components” part after reflashing the device from SDK manager first.

Thanks.

Hi AastaLLL,

Thank you for your support.
Pycuda is installed.

I actually used sdkmanager to re-install the whole agx Xavier system included deepstream.

However. I still got the error. (#20)
OSError: libnvinfer.so.5: cannot open shared object file: No such file or directory

~/TRT_object_detection$ python3 main.py
2020-04-20 18:58:05.406758: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
Traceback (most recent call last):
File “main.py”, line 18, in
ctypes.CDLL(“lib/libflattenconcat.so”)
File “/usr/lib/python3.6/ctypes/init.py”, line 348, in init
self._handle = _dlopen(self._name, mode)
OSError: libnvinfer.so.5: cannot open shared object file: No such file or directory

libnvinfer.so.6 could be find in /usr/lib/aarch64-linux-gnu but not .so.5.

My questions are:

  1. recommend to install libnvinfer.so.5 manaully?
  2. How to install the dependency of TRT_object_detection?
    https://developer.download.nvidia.com/compute/redist/jp/v43 tensorflow==1.15.2+nv20.3 – user
    or
    https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu==1.13.1+nv19.5 --user

Thank you,

Hi,

Please stay with JetPack4.3 and update the plugin library as following:

$ cd /usr/src/tensorrt/samples/python/uff_ssd/
$ sudo mkdir build
$ cd build/
$ sudo cmake ../
$ sudo make
$ cp libflattenconcat.so  {TRT_object_detection_ROOT}/lib/libflattenconcat.so

Thanks.

Hi AastaLLL,

Thanks for your great support.

It works.
Models can detect object.

Is there a way to get frozen_inference_graph.pb for our models?
For example the model below.
https://drive.google.com/drive/folders/1z_lICNms-eZnJVc6kmpOz57vMs310O9K

Thank you,

Hi AastaLLL,

Is there a way to get frozen_inference_graph.pb for our models?

Does Jetpack 4.4 support tensorflow inception model?
Is there any sample to refer it?

Thank you,

Hi,

There is a non-supported switch layer inside your model.
Is this used for training stage?

If yes, could you help to remove the layer from the h5 file?
Thanks.

Hi AastaLLL,

Would you share us which layer is not supported?
I assume you tested Keras_inception model.
With this model training, most of layers are using keras default layers.

The layers we can remove are as the below.
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation=‘relu’)(x)
predictions = Dense(2, activation=‘softmax’)(x)
model = Model(inputs=base_model.input, outputs=predictions)

Which layer would you like to remove?

attached the train information.

Thank you,

Hi,

Sorry for the late update.

The non-supported operation is call swich.
This is an operation-level layer which is added by Keras/TensorFlow based on their implementation.

AFAIK, this operation is generally used in the training stage.
So, it may be worthy to check if you can remove the non-necessary training node by turning off training phase before the serialization.

K.set_learning_phase(0) 

Thanks.
In general, this is an auxiliary type training operation.

Hi AastaLLL,

Thank you for your support.

However, it did not work, after we added keras.backend.set_learning_phase(0).

Still got the same error. (see attached.)

The training time increased as added keras.backend.set_learning_phase(0).

Thank you for any advice,

Hi,

Thanks for your testing.

Would you mind to share the learning phase OFF model for us checking?
Thanks.

Hi AastaLLL,

Thank you so much for your great support.
We could transfer our model to tensor RT already.
Sorry. We did not use learning phase OFF model but use a forzen model pb file instead.

Thanks again,