TF-TRT conversion is broken on 32.7.1

tensorflow docker looks like it’s broken.

My Dockerfile:

FROM nvcr.io/nvidia/l4t-tensorflow:r32.7.1-tf2.7-py3

RUN python3 -m pip install tensorflow_datasets

Running with

docker build -t trt_example --file Dockerfile .
docker run -it --rm --gpus all  --privileged --name ""   --volume $PWD:/src:rw  trt_example  bash

I am running this python3 file:

import tensorflow as tf
import tensorflow_datasets as tfds

(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

ds_train = ds_train.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)

ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.AUTOTUNE)

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation='relu'),
  tf.keras.layers.Dense(10)
])
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

model.fit(
    ds_train,
    epochs=6,
    validation_data=ds_test,
)

model.save("saved_model")

But it crashes with:

e.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4321 MB memory:  -> device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2
Epoch 1/6
Traceback (most recent call last):
  File "main.py", line 43, in <module>
    validation_data=ds_test,
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1384, in fit
    tmp_logs = self.train_function(iterator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 910, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 958, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 781, in _initialize
    *args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 3157, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 3557, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 3402, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 1143, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 672, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 1129, in autograph_handler
    raise e.ag_error_metadata.to_exception(e)
AttributeError: in user code:

    File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1021, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1010, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1000, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 863, in train_step
        self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/usr/local/lib/python3.6/dist-packages/keras/optimizer_v2/optimizer_v2.py", line 532, in minimize
        return self.apply_gradients(grads_and_vars, name=name)
    File "/usr/local/lib/python3.6/dist-packages/keras/optimizer_v2/optimizer_v2.py", line 668, in apply_gradients
        grads_and_vars = self._aggregate_gradients(grads_and_vars)
    File "/usr/local/lib/python3.6/dist-packages/keras/optimizer_v2/optimizer_v2.py", line 484, in _aggregate_gradients
        return self.gradient_aggregator(grads_and_vars)
    File "/usr/local/lib/python3.6/dist-packages/keras/optimizer_v2/utils.py", line 33, in all_reduce_sum_gradients
        if tf.__internal__.distribute.strategy_supports_no_merge_call():

    AttributeError: module 'tensorflow.compat.v2.__internal__.distribute' has no attribute 'strategy_supports_no_merge_call'

Tried downgrading keras to 2.7.0, with this Dockerfile:

FROM nvcr.io/nvidia/l4t-tensorflow:r32.7.1-tf2.7-py3

RUN python3 -m pip install tensorflow_datasets keras==2.7.0

The first python script runs, but it breaks in conversion:

from tensorflow.python.compiler.tensorrt import trt_convert as trt
import tensorflow as tf

converter = trt.TrtGraphConverterV2(input_saved_model_dir="saved_model")
converter.convert()
converter.save("output")

Hi,

What kind of error do you get when converting the TF-TRT model?
We try your source and only got a warning as below:

2022-03-21 02:31:09.095717: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at trt_engine_resource_ops.cc:198 : NOT_FOUND: Container TF-TRT does not exist. (Could not find resource: TF-TRT/TRTEngineOp_0_0)

Based on our document, this warning can be ignored directly:
https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/rel_19.11.html

  • The following warning is issued when the method build() from the API is not called. This warning can be ignored.
    OP_REQUIRES failed at trt_engine_resource_ops.cc:183 : Not found: Container TF-TRT does not exist. (Could not find resource: TF-TRT/TRTEngineOp_...

  • The following warning is issued because internally TensorFlow calls the TensorRT optimizer for certain objects unnecessarily. This warning can be ignored.
    OP_REQUIRES failed at trt_engine_resource_ops.cc:183 : Not found: Container TF-TRT does not exist. (Could not find resource: TF-TRT/TRTEngineOp_...

Thanks.

Did you try it in the container built from this dockerfile:

FROM nvcr.io/nvidia/l4t-tensorflow:r32.7.1-tf2.7-py3

RUN python3 -m pip install tensorflow_datasets

If I install keras 2.7.0 the error:

ERROR:tensorflow:Loaded TensorRT 8.0.1 but linked TensorFlow against TensorRT 8.2.1. A few requirements must be met:
	-It is required to use the same major version of TensorRT during compilation and runtime.
	-TensorRT does not support forward compatibility. The loaded version has to be equal or more recent than the linked version.
Traceback (most recent call last):
  File "save_trt.py", line 4, in <module>
    converter = trt.TrtGraphConverterV2(input_saved_model_dir="saved_model")
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 552, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/compiler/tensorrt/trt_convert.py", line 1104, in __init__
    _check_trt_version_compatibility()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/compiler/tensorrt/trt_convert.py", line 262, in _check_trt_version_compatibility
    raise RuntimeError("Incompatible TensorRT major version")
RuntimeError: Incompatible TensorRT major version

Hi,

l4t-tensorflow:r32.7.1-tf2.7-py3 requires the software from JetPack4.6.1, which should have TensorRT 8.2.1.

ERROR:tensorflow:Loaded TensorRT 8.0.1 but linked TensorFlow against TensorRT 8.2.1.

Based on your error, it seems that you are still using JetPack 4.6 and TensorRT 8.0.
Please upgrade your software to the latest JetPack then try it again.

Thanks.

The problem is the docker, all of these steps run fine directly on the Xavier, installing Tensorflow 2.7.0 per Nvidia instrcutions, but running on the docker it doesn’t work, and I think downgrading Keras to 2.5.0 also breaks TensorFlow.

dpkg-query --show nvidia-l4t-core returns
nvidia-l4t-core 32.7.1-20220219090344

Hi,

May I know which JetPack do you use first?
Is it JetPack 4.6.1?

$ sudo apt show nvidia-jetpack

Thanks.

Running sudo apt show nvidia-jetpack returns:

Package: nvidia-jetpack
Version: 4.6.1-b110
Priority: standard
Section: metapackages
Maintainer: NVIDIA Corporation
Installed-Size: 199 kB
Depends: nvidia-cuda (= 4.6.1-b110), nvidia-opencv (= 4.6.1-b110), nvidia-cudnn8 (= 4.6.1-b110), nvidia-tensorrt (= 4.6.1-b110), nvidia-visionworks (= 4.6.1-b110), nvidia-container (= 4.6.1-b110), nvidia-vpi (= 4.6.1-b110), nvidia-l4t-jetson-multimedia-api (>> 32.7-0), nvidia-l4t-jetson-multimedia-api (<< 32.8-0)
Homepage: http://developer.nvidia.com/jetson
Download-Size: 29,4 kB
APT-Sources: https://repo.download.nvidia.com/jetson/t194 r32.7/main arm64 Packages
Description: NVIDIA Jetpack Meta Package

Hi,

We try the Dockerfile as below and are able to get the ‘output’ model successfully.

FROM nvcr.io/nvidia/l4t-tensorflow:r32.7.1-tf2.7-py3

RUN python3 -m pip install tensorflow_datasets keras==2.7.0

The only difference seems to be the command for launching the container.
Maybe you can also give it a try.

$ docker build -t trt_example --file Dockerfile .
$ sudo docker run -it --rm --runtime nvidia --network host -v ${PWD}:/home/nvidia/ -w /home/nvidia trt_example
...
################################################################################
TensorRT unsupported/non-converted OP Report:
        - NoOp -> 5x
        - Identity -> 1x
        - Placeholder -> 1x
--------------------------------------------------------------------------------
        - Total nonconverted OPs: 7
        - Total nonconverted OP Types: 3
For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops.
################################################################################

2022-03-29 05:58:18.311973: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:806] Number of TensorRT candidate segments: 1
2022-03-29 05:58:18.317686: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:919] Replaced segment 0 consisting of 10 nodes by TRTEngineOp_0_0.
2022-03-29 05:58:18.343048: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1176] Optimization results for grappler item: tf_graph
  constant_folding: Graph size after: 19 nodes (-8), 26 edges (-8), time = 27.541ms.
  layout: Graph size after: 19 nodes (0), 26 edges (0), time = 3.95ms.
  constant_folding: Graph size after: 19 nodes (0), 26 edges (0), time = 3.376ms.
  TensorRTOptimizer: Graph size after: 10 nodes (-9), 13 edges (-13), time = 11.946ms.
  constant_folding: Graph size after: 10 nodes (0), 13 edges (0), time = 2.343ms.
Optimization results for grappler item: TRTEngineOp_0_0_native_segment
  constant_folding: Graph size after: 16 nodes (0), 15 edges (0), time = 3.35ms.
  layout: Graph size after: 16 nodes (0), 15 edges (0), time = 3.753ms.
  constant_folding: Graph size after: 16 nodes (0), 15 edges (0), time = 3.436ms.
  TensorRTOptimizer: Graph size after: 16 nodes (0), 15 edges (0), time = 0.332ms.
  constant_folding: Graph size after: 16 nodes (0), 15 edges (0), time = 3.61ms.

2022-03-29 05:58:18.445023: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at trt_engine_resource_ops.cc:198 : NOT_FOUND: Container TF-TRT does not exist. (Could not find resource: TF-TRT/TRTEngineOp_0_0)

Thanks.

I do exactly that and it doesn’t work. Changing the docker command doesn’t work. Same error as before:

ERROR:tensorflow:Loaded TensorRT 8.0.1 but linked TensorFlow against TensorRT 8.2.1. A few requirements must be met:
	-It is required to use the same major version of TensorRT during compilation and runtime.
	-TensorRT does not support forward compatibility. The loaded version has to be equal or more recent than the linked version.
Traceback (most recent call last):
  File "save_trt.py", line 4, in <module>
    converter = trt.TrtGraphConverterV2(input_saved_model_dir="saved_model")
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 552, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/compiler/tensorrt/trt_convert.py", line 1104, in __init__
    _check_trt_version_compatibility()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/compiler/tensorrt/trt_convert.py", line 262, in _check_trt_version_compatibility
    raise RuntimeError("Incompatible TensorRT major version")
RuntimeError: Incompatible TensorRT major version

Maybe we have different JetPack instalations? We are using:


docker run -it --rm --privileged -v /dev/bus/usb:/dev/bus/usb/ --volume /tmp/:/home/nvidia/nvidia:rw --name JetPack_TX2_Devkit sdkmanager --cli install --logintype devzone --product Jetson --target P2888-0001 --targetos Linux --version 4.6.1 --select 'Jetson OS' --deselect 'Jetson SDK Components' --flash all --license accept --staylogin true --datacollection disable --exitonfinish
cd /tmp/nvidia_sdk/JetPack_4.6.1_Linux_JETSON_AGX_XAVIER_TARGETS/Linux_for_Tegra/
sudo ./flash.sh jetson-agx-xavier-devkit mmcblk0p1

And this:

docker run -it --rm --privileged -v /dev/bus/usb:/dev/bus/usb/ --volume /tmp/:/home/nvidia/nvidia:rw --name JetPack_TX2_Devkit sdkmanager --cli install --logintype devzone --product Jetson --target P2888-0001 --targetos Linux --version 4.6.1 --deselect 'Jetson OS' --select 'Jetson SDK Components' --flash all --license accept --staylogin true --datacollection disable --exitonfinish

To flash and install.

Okay, reinstalled only the base jetpack 4.6.1 and it works, but I don’t understand why installing as we normally do doesn’t work, here is a list of simplified steps of our jetpack instalation:

  1. We use the docker run -it --rm --privileged -v /dev/bus/usb:/dev/bus/usb/ --volume /tmp/:/home/nvidia/nvidia:rw --name JetPack_TX2_Devkit sdkmanager --cli install --logintype devzone --product Jetson --target P2888-0001 --targetos Linux --version 4.6.1 --select 'Jetson OS' --deselect 'Jetson SDK Components' --flash all --license accept --staylogin true --datacollection disable --exitonfinish command to download the Jetpack.
  2. We use a series of commands to fix the dtb so that the CAN clock is the PLLAON (guide: Update the device-tree for CAN-BUS, PLLAON Clock - #4 by peclatj).
  3. We use the command cd /tmp/nvidia_sdk/JetPack_4.6.1_Linux_JETSON_AGX_XAVIER_TARGETS/Linux_for_Tegra/; sudo ./flash.sh jetson-agx-xavier-devkit mmcblk0p1 to flash.
  4. We use the GitHub - jetsonhacks/rootOnNVMe: Switch the rootfs to a NVMe SSD on the Jetson Xavier NX and Jetson AGX Xavier repo to move the boot to the NVMe.
  5. We set a service to start the CAN with the correct bitrate on boot.
  6. We use the docker run -it --rm --privileged -v /dev/bus/usb:/dev/bus/usb/ --volume /tmp/:/home/nvidia/nvidia:rw --name JetPack_TX2_Devkit sdkmanager --cli install --logintype devzone --product Jetson --target P2888-0001 --targetos Linux --version 4.6.1 --deselect 'Jetson OS' --select 'Jetson SDK Components' --flash all --license accept --staylogin true --datacollection disable --exitonfinish command to add dependencies.
  7. We updated chrome with sudo apt install --only-upgrade chromium-browser.

Maybe moving the root to NVMe is breaking libraries? is that possible?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.