Error building with TF

Hello,

I try to compile deepmind/reverb on my Jetson Xavier NX but I get an error after successful building by Bazel when I have been importing this.

>>> import reverb
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/.local/lib/python3.6/site-packages/reverb/__init__.py", line 27, in <module>
    from reverb import item_selectors as selectors
  File "/home/ubuntu/.local/lib/python3.6/site-packages/reverb/item_selectors.py", line 19, in <module>
    from reverb import pybind
  File "/home/ubuntu/.local/lib/python3.6/site-packages/reverb/pybind.py", line 1, in <module>
    import tensorflow as _tf; from .libpybind import *; del _tf
ImportError: /home/ubuntu/.local/lib/python3.6/site-packages/reverb/libschema_cc_proto.so: undefined symbol: _ZNK6google8protobuf7Message25InitializationErrorStringEv

I find that libschema_cc_proto.so is depend on libtensorflow_framework.so.2 but I have it installed by pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v45 tensorflow.

With bazel test I get more errors:

bazel-out/aarch64-opt/bin/_solib_aarch64/libreverb_Scc_Slibtensor_Ucompression.so: error: undefined reference to 'tensorflow::TfCheckOpHelperOutOfLine(tensorflow::Status const&, char const*)'
bazel-out/aarch64-opt/bin/_solib_aarch64/_U_S_Sreverb_Scc_Cschema_Ucc_Uproto___Ureverb_Scc/libschema_cc_proto.so: error: undefined reference to 'google::protobuf::internal::fixed_address_empty_string'
bazel-out/aarch64-opt/bin/_solib_aarch64/_U_S_Sreverb_Scc_Cschema_Ucc_Uproto___Ureverb_Scc/libschema_cc_proto.so: error: undefined reference to 'google::protobuf::internal::WireFormatLite::WriteStringMaybeAliased(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
bazel-out/aarch64-opt/bin/_solib_aarch64/_U_S_Sreverb_Scc_Cschema_Ucc_Uproto___Ureverb_Scc/libschema_cc_proto.so: error: undefined reference to 'google::protobuf::internal::WireFormatLite::ReadBytes(google::protobuf::io::CodedInputStream*, std::string*)'
bazel-out/aarch64-opt/bin/_solib_aarch64/_U_S_Sreverb_Scc_Cschema_Ucc_Uproto___Ureverb_Scc/libschema_cc_proto.so: error: undefined reference to 'google::protobuf::io::CodedOutputStream::WriteStringWithSizeToArray(std::string const&, unsigned char*)'
bazel-out/aarch64-opt/bin/_solib_aarch64/_U_S_Sreverb_Scc_Cschema_Ucc_Uproto___Ureverb_Scc/libschema_cc_proto.so: error: undefined reference to 'google::protobuf::Message::GetTypeName() const'
bazel-out/aarch64-opt/bin/_solib_aarch64/_U_S_Sreverb_Scc_Cschema_Ucc_Uproto___Ureverb_Scc/libschema_cc_proto.so: error: undefined reference to 'google::protobuf::Message::InitializationErrorString() const'
bazel-out/aarch64-opt/bin/_solib_aarch64/libreverb_Scc_Slibchunk_Ustore.so: error: undefined reference to 'tensorflow::Status::Status(tensorflow::error::Code, std::basic_string_view<char, std::char_traits<char> >, std::vector<tensorflow::StackFrame, std::allocator<tensorflow::StackFrame> >&&)'
bazel-out/aarch64-opt/bin/_solib_aarch64/libreverb_Scc_Slibchunk_Ustore.so: error: undefined reference to 'tensorflow::strings::StrCat(tensorflow::strings::AlphaNum const&)'
bazel-out/aarch64-opt/bin/reverb/cc/support/_objs/trajectory_util_test/trajectory_util_test.o:trajectory_util_test.cc:function std::string* tensorflow::internal::MakeCheckOpString<unsigned long, unsigned long>(unsigned long const&, unsigned long const&, char const*): error: undefined reference to 'tensorflow::internal::CheckOpMessageBuilder::NewString()'
bazel-out/aarch64-opt/bin/reverb/cc/support/_objs/trajectory_util_test/trajectory_util_test.o:trajectory_util_test.cc:function std::string* tensorflow::internal::MakeCheckOpString<long, long>(long const&, long const&, char const*): error: undefined reference to 'tensorflow::internal::CheckOpMessageBuilder::NewString()'
bazel-out/aarch64-opt/bin/reverb/cc/support/_objs/trajectory_util_test/trajectory_util_test.o:trajectory_util_test.cc:function deepmind::reverb::FlatTrajectory deepmind::reverb::testing::CreateProto<deepmind::reverb::FlatTrajectory>(std::string const&): error: undefined reference to 'google::protobuf::TextFormat::ParseFromString(std::string const&, google::protobuf::Message*)'
bazel-out/aarch64-opt/bin/reverb/cc/support/_objs/trajectory_util_test/trajectory_util_test.o:trajectory_util_test.cc:function bool deepmind::reverb::testing::ProtoStringMatcher::MatchAndExplain<deepmind::reverb::FlatTrajectory>(deepmind::reverb::FlatTrajectory const&, testing::MatchResultListener*) const: error: undefined reference to 'google::protobuf::util::MessageDifferencer::ReportDifferencesToString(std::string*)'
bazel-out/aarch64-opt/bin/reverb/cc/support/_objs/trajectory_util_test/trajectory_util_test.o:trajectory_util_test.cc:function deepmind::reverb::test::internal::Expector<int, false>::Equal(tensorflow::Tensor const&, tensorflow::Tensor const&): error: undefined reference to 'tensorflow::TensorShapeRep::DebugString() const'
bazel-out/aarch64-opt/bin/reverb/cc/support/_objs/trajectory_util_test/trajectory_util_test.o:trajectory_util_test.cc:function deepmind::reverb::test::internal::Expector<int, false>::Equal(tensorflow::Tensor const&, tensorflow::Tensor const&): error: undefined reference to 'tensorflow::TensorShapeRep::DebugString() const'
collect2: error: ld returned 1 exit status
1626421737.823070820: src/main/tools/linux-sandbox-pid1.cc:410: wait returned pid=2, status=0x100
1626421737.823093989: src/main/tools/linux-sandbox-pid1.cc:428: child exited normally with code 1
1626421737.852817835: src/main/tools/linux-sandbox.cc:233: child exited normally with code 1
INFO: Elapsed time: 941.694s, Critical Path: 24.11s
INFO: 1231 processes: 246 internal, 985 linux-sandbox.
FAILED: Build did NOT complete successfully
//reverb/cc:chunk_store_test                                          NO STATUS
//reverb/cc:chunker_test                                              NO STATUS
//reverb/cc:client_test                                               NO STATUS
//reverb/cc:rate_limiter_test                                         NO STATUS
//reverb/cc:reverb_service_impl_test                                  NO STATUS
//reverb/cc:sampler_test                                              NO STATUS
//reverb/cc:table_test                                                NO STATUS
//reverb/cc:tensor_compression_test                                   NO STATUS
//reverb/cc:trajectory_writer_test                                    NO STATUS
//reverb/cc:writer_test                                               NO STATUS
//reverb/cc/platform:net_test                                         NO STATUS
//reverb/cc/platform:server_test                                      NO STATUS
//reverb/cc/platform:tfrecord_checkpointer_test                       NO STATUS
//reverb/cc/platform:thread_test                                      NO STATUS
//reverb/cc/selectors:fifo_test                                       NO STATUS
//reverb/cc/selectors:heap_test                                       NO STATUS
//reverb/cc/selectors:lifo_test                                       NO STATUS
//reverb/cc/selectors:prioritized_test                                NO STATUS
//reverb/cc/selectors:uniform_test                                    NO STATUS
//reverb/cc/support:cleanup_test                                      NO STATUS
//reverb/cc/support:intrusive_heap_test                               NO STATUS
//reverb/cc/support:periodic_closure_test                             NO STATUS
//reverb/cc/support:queue_test                                        NO STATUS
//reverb/cc/support:signature_test                                    NO STATUS
//reverb/cc/support:unbounded_queue_test                              NO STATUS
//reverb/cc/support:trajectory_util_test                        FAILED TO BUILD

Hi,

This looks like a compatibility issue to us.
Could you check which TensorFlow version is required for deepmind/reverb first?

Thanks.

deepmind/reverb r0.3.1 works well with TF 2.5.0 that is available at pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v45 tensorflow I’m building it by Bazel 3.7.2 arm64 directly on Jetson Xavier NX.

I checked the TF paths finded by Bazel during building and they’re correct:
DEBUG: /home/ubuntu/reverb/reverb/cc/platform/default/repo.bzl:29:10: TF include: /home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/include
DEBUG: /home/ubuntu/reverb/reverb/cc/platform/default/repo.bzl:45:10: TF libs: /home/ubuntu/.local/lib/python3.6/site-packages/tensorflow

I have all versions and libraries the same as it’s needed but on Ubuntu 18.04.5 LTS amd64 it’s working but on aarch64 not. Is it about libraries paths issue or something is missing during building? Or in incompatibility of GCC version used during building TF 2.5.0 for NVIDIA Jetson and deepmind/reverb?

Hi,

_ZNK6google8protobuf7Message25InitializationErrorStringEv

It looks like a protobuf symbol.

Since both TensorFlow and Reverb use protobuf.
Could you check if they are compatible? And if in the same version?

Thanks.

Hi, I try it with the latest version of Reverb (master branch) and TF 2.5.0 too but without success. I checked how Reverb use protobuf and it’s compared to the version of protobuf that TF use:

_CHECK_VERSION = """
PROTOC_VERSION=$$($(location @protobuf_protoc//:protoc_bin) --version \
  | cut -d' ' -f2 | sed -e 's/\\./ /g')
PROTOC_VERSION=$$(printf '%d%03d%03d' $${PROTOC_VERSION})
TF_PROTO_VERSION=$$(grep '#define PROTOBUF_MIN_PROTOC_VERSION' \
  $(location tf_includes/google/protobuf/port_def.inc) | cut -d' ' -f3)
if [ "$${PROTOC_VERSION}" -ne "$${TF_PROTO_VERSION}" ]; then
  echo !!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1>&2
  echo Your protoc version does not match the tensorflow proto header \
       required version: "$${PROTOC_VERSION}" vs. "$${TF_PROTO_VERSION}" 1>&2
  echo Please update the PROTOC_VERSION in your WORKSPACE file. 1>&2
  echo !!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1>&2
  false
else
  touch $@
fi
"""

Result

ubuntu@nvidia-01:~/Desktop/Projects/reverb$ python3 configure.py 
Please specify the location of python. [Default is /usr/bin/python3]: 


Found possible Python library paths:
  /usr/local/lib/python3.6/dist-packages
  /usr/lib/python3.6/dist-packages
  /usr/lib/python3/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python3.6/dist-packages]

ubuntu@nvidia-01:~/Desktop/Projects/reverb$ bazel build -c opt --copt="-march=armv8-a+crypto"  reverb/pip_package:build_pip_package
INFO: Analyzed target //reverb/pip_package:build_pip_package (73 packages loaded, 8975 targets configured).
INFO: Found 1 target...
Target //reverb/pip_package:build_pip_package up-to-date:
  bazel-bin/reverb/pip_package/build_pip_package
INFO: Elapsed time: 4647.716s, Critical Path: 2745.83s
INFO: 1396 processes: 30 internal, 1366 linux-sandbox.
INFO: Build completed successfully, 1396 total actions

ubuntu@nvidia-01:~/Desktop/Projects/reverb$ ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release
Št júl 22 20:51:11 CEST 2021 : === Preparing sources in dir: /tmp/tmp.cllUJKIBr8
Št júl 22 20:51:11 CEST 2021 Setting PYTHON_BIN_PATH equal to what was set with configure.py.
Št júl 22 20:51:11 CEST 2021 : === Building wheel
Št júl 22 20:51:18 CEST 2021 : === Output wheel file is in: /tmp/reverb/dist/

ubuntu@nvidia-01:~/Desktop/Projects/reverb$ pip3 install /tmp/reverb/dist/dm_reverb-0.3.0rc0-cp36-cp36m-linux_aarch64.whl 
Defaulting to user installation because normal site-packages is not writeable
Processing /tmp/reverb/dist/dm_reverb-0.3.0rc0-cp36-cp36m-linux_aarch64.whl
Requirement already satisfied: dataclasses in /usr/local/lib/python3.6/dist-packages (from dm-reverb==0.3.0rc0) (0.8)
Requirement already satisfied: portpicker in /home/ubuntu/.local/lib/python3.6/site-packages (from dm-reverb==0.3.0rc0) (1.4.0)
Requirement already satisfied: dm-tree in /home/ubuntu/.local/lib/python3.6/site-packages (from dm-reverb==0.3.0rc0) (0.1.6)
Requirement already satisfied: six>=1.12.0 in /usr/local/lib/python3.6/dist-packages (from dm-tree->dm-reverb==0.3.0rc0) (1.15.0)
Installing collected packages: dm-reverb
Successfully installed dm-reverb-0.3.0rc0
ubuntu@nvidia-01:~/Desktop/Projects/reverb$ python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import reverb
2021-07-22 20:54:12.883628: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/.local/lib/python3.6/site-packages/reverb/__init__.py", line 27, in <module>
    from reverb import item_selectors as selectors
  File "/home/ubuntu/.local/lib/python3.6/site-packages/reverb/item_selectors.py", line 19, in <module>
    from reverb import pybind
  File "/home/ubuntu/.local/lib/python3.6/site-packages/reverb/pybind.py", line 1, in <module>
    import tensorflow as _tf; from .libpybind import *; del _tf
ImportError: /home/ubuntu/.local/lib/python3.6/site-packages/reverb/libschema_cc_proto.so: undefined symbol: _ZNK6google8protobuf7Message25InitializationErrorStringEv

Changes

There are changes into a repo that are needed for successfully building.

diff --git a/.bazelrc b/.bazelrc
index f4b08d3..6e91bba 100644
--- a/.bazelrc
+++ b/.bazelrc
@@ -19,7 +19,7 @@ build --copt="-Wall" --copt="-Wno-sign-compare"
 build --linkopt="-lrt -lm"
 # We build with AVX and eigen byte alignment to match tensorflow's (and Eigen)
 # pip package byte alignment.  See b/186669968 for more details.
-build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64
+build --copt=-DEIGEN_MAX_ALIGN_BYTES=64
 
 # TF isn't built in dbg mode, so our dbg builds will segfault due to inconsistency
 # of defines when using tf's headers.  In particular in refcount.h.
diff --git a/WORKSPACE b/WORKSPACE
index 14c7b0e..da6d816 100644
--- a/WORKSPACE
+++ b/WORKSPACE
@@ -14,7 +14,7 @@ workspace(name = "reverb")
 # *WARNING* If using the REVERB_PROTOC_VERSION environment variable, sha256
 # checking is disabled.  Use at your own risk.
 PROTOC_VERSION = "3.9.0"
-PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
+PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"
 
 load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
 
diff --git a/oss_build.sh b/oss_build.sh
index 40e07fd..d338f17 100644
--- a/oss_build.sh
+++ b/oss_build.sh
@@ -114,10 +114,10 @@ for python_version in $PYTHON_VERSIONS; do
   # someone's system unexpectedly. We are executing the python tests after
   # installing the final package making this approach satisfactory.
   # TODO(b/157223742): Execute Python tests as well.
-  bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/...
+  bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/...
 
   # Builds Reverb and creates the wheel package.
-  bazel build -c opt --copt=-mavx --config=manylinux2010 reverb/pip_package:build_pip_package
+  bazel build -c opt --copt="-march=armv8-a+crypto"  reverb/pip_package:build_pip_package
   ./bazel-bin/reverb/pip_package/build_pip_package --dst $OUTPUT_DIR $PIP_PKG_EXTRA_ARGS
 
   # Installs pip package.
diff --git a/reverb/cc/platform/default/repo.bzl b/reverb/cc/platform/default/repo.bzl
index 1daac52..62c91b7 100644
--- a/reverb/cc/platform/default/repo.bzl
+++ b/reverb/cc/platform/default/repo.bzl
@@ -331,7 +331,7 @@ def _reverb_protoc_archive(ctx):
         version = override_version
 
     urls = [
-        "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version),
+        "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version),
     ]
     ctx.download_and_extract(
         url = urls,
diff --git a/reverb/pip_package/build_pip_package.sh b/reverb/pip_package/build_pip_package.sh
index f15b83d..3f68d8d 100755
--- a/reverb/pip_package/build_pip_package.sh
+++ b/reverb/pip_package/build_pip_package.sh
@@ -32,7 +32,7 @@ function build_wheel() {
   pushd ${TMPDIR} > /dev/null
 
   echo $(date) : "=== Building wheel"
-  "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null
+  "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} > /dev/null
   DEST=${TMPDIR}/dist/
   if [[ ! "$TMPDIR" -ef "$DESTDIR" ]]; then
     mkdir -p ${DESTDIR}

Reverb use directly the same protobuf as TF:

cc_library(
    name = "includes",
    data = [":versions_compared"],
    hdrs = glob([
        "tf_includes/google/protobuf/*.h",
        "tf_includes/google/protobuf/*.inc",
        "tf_includes/google/protobuf/**/*.h",
        "tf_includes/google/protobuf/**/*.inc",
    ]),
    includes = ["tf_includes"],
    visibility = ["//visibility:public"],
)

Hi,

Have you checked this issue with the Reverb team?
Since the dependency comes from Reverb, they may know more about the support status.

Thanks.

Firstly, I wrote to GitHub Issue and to StackOverflow too, but without activity for now.

@AastaLLL

The issue is in your version of TF provided by sudo pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v46 tensorflow.

The library isn’t complete and missing for example _ZNK6google8protobuf7Message25InitializationErrorStringEv.

You can check it by objdump -x /usr/local/lib/python3.6/dist-packages/tensorflow/libtensorflow_framework.so.2 | grep '_ZNK6google8protobuf7Message25InitializationErrorStringEv' … The result will be nothing.

Please can you tell me instructions that you’re using during building this variant of TensorFlow or please can publish a version of TF that will be complete (like a standard TF distributed by pypi.org)?

Hi,

Just check the protobuf source.
It seems that google::protobuf::Message::InitializationErrorString() presents across all the version.
So it is less possible that TensorFlow uses a protobuf library without the function.
More, it is a stand-alone library rather than being included in the TF package.

ImportError: /home/ubuntu/.local/lib/python3.6/site-packages/reverb/libschema_cc_proto.so: undefined symbol: _ZNK6google8protobuf7Message25InitializationErrorStringEv

Back to the original error, the reverb library complains about a protobuf missing symbol.
Have you checked that if you include and link the protobuf lib correctly?

If you want to test a custom TensorFlow build, actually you can build it from the source directly.
The instructions can be found in the following GitHub:

Thanks.

@AastaLLL

Reverb uses protobuf directly contained in libtensorflow_framework.so.2

Now I successfully find it there:

ubuntu@nvidia-01:~$ objdump -C  -x /usr/local/lib/python3.6/dist-packages/tensorflow/libtensorflow_framework.so.2  | grep InitializationErrorString
0000000000e26478 l     F .text	0000000000000040              google::protobuf::MessageLite::InitializationErrorString[abi:cxx11]() const [clone .localalias.41]
0000000000d8b4a0 g     F .text	00000000000001e0              google::protobuf::Message::InitializationErrorString[abi:cxx11]() const
0000000000e26478 g     F .text	0000000000000040              google::protobuf::MessageLite::InitializationErrorString[abi:cxx11]() const
ubuntu@nvidia-01:~$

But still, I don’t know why Reverb cannot use it from there during importing to Python, but during building all is fine. I think that it is about different compilers used during build TF and that I’m using during building Reverb … because of different naming of functions in .so.

Thanks, I’ll try to build TF on my own.

@AastaLLL

I cannot build TF 2.5.0 by your tutorial … I get an error:

**ERROR:** /home/ubuntu/.cache/bazel/_bazel_ubuntu/cfc230b92457d94a7c5525256f32b9ce/external/cub_archive/BUILD.bazel:11:11: no such package '@local_cuda//': The repository '@local_cuda' could not be resolved and referenced by '@cub_archive//:cub'

**ERROR:** Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted: Analysis failed

INFO: Elapsed time: 19.181s

INFO: 0 processes.

**FAILED:** Build did NOT complete successfully (321 packages loaded, 9403 targets configured)

@AastaLLL

In Reverb’s .bazelrc is needed instead of:

build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"

set:

build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1"

Now all is working … Thanks.

Thanks for updating this.
Good to know it finally works!