Symbol resolution conflicts with Triton Server for Jetpack TensorFlow backend (gRPC, protobuf, absl, etc.)

Hi All -

I’m working with a Jetson Orin Nano devkit running Jetson Linux with Jetpack installed. My goal is to make use of Triton Server on this platform within a C++ application via the In-Process API to run TensorFlow models. Since this is a Jetson platform, I’m using Triton Server from the Jetpack-specific Triton tarball.

However, I’ve run into a tricky symbol resolution issue with this setup that has me totally blocked.

Our application makes use of a framework built on gRPC, Protobuf, and associated dependencies like absl. On Ubuntu 20.04 (Focal) we can’t use the system versions of those libraries since they are too old (or non-existent), so we build our own versions into /usr/local. I expect though that we would encounter the same problems described below if we used the system versions.

The TensorFlow support libraries that come with the Triton Server installation also make use of these libraries, but they are “baked in” to the libtensorflow_cc.so.2 and libtensorflow_framework.so.2 libraries in backends/tensorflow.

Unfortunately, it seems that when Triton Server dlopens the TensorFlow support libraries the versions of the symbols from our framework are being used preferentially to the baked in versions of those symbols contained in the TensorFlow backend shared libraries, leading to not-so-subtle crashes while loading a TF model from the model repository during TRITONSERVER_ServerNew.

In order to test this, I created a simple toy program that linked against libtritonserver.so and did not use our framework, and used LD_DEBUG to explore symbols resolution. In the resulting logs, it is clear that the TensorFlow library is correctly resolving things like its protobuf symbols “internally” when our framework is not present:

19:     binding file /opt/tritonserver/backends/tensorflow/libtensorflow_framework.so.2 [0] to /opt/tritonserver/backends/tensorflow/libtensorflow_framework.so.2 [0]: normal symbol `_ZNK6google8protobuf16RepeatedPtrFieldINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEE4dataEv'
19:     binding file /opt/tritonserver/backends/tensorflow/libtensorflow_cc.so.2 [0] to /opt/tritonserver/backends/tensorflow/libtensorflow_framework.so.2 [0]: normal symbol `_ZNK6google8protobuf16RepeatedPtrFieldINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEE4dataEv'

The toy program running like this is able to load the TF models from the repository.

I then added a link dependency on our framework to the toy program, which pulled in a transitive dependency on the Protobuf shared library we built to /usr/local. After doing so, it is clear from the LD_DEBUG output that now the Triton TensorFlow backend is incorrectly resolving its internal symbols against the Protobuf library from /usr/local:

20:	binding file /opt/tritonserver/backends/tensorflow/libtensorflow_cc.so.2 [0] to /usr/local/lib/libprotobuf.so.32 [0]: normal symbol `_ZNK6google8protobuf16RepeatedPtrFieldINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEE4dataEv'

This run, where the only difference is the presence (not even invocation) of our framework library and its transitive dependencies, crashes badly during model loading.

This is more or less doomed to fail since there is no assurance that the version of Protobuf (or gRPC, or absl, etc.) that our framework uses (whether that is system or custom) matches what the TritonServer tensorflow backend libraries were built with.

From here I have several questions and potential paths forward, and I could use some guidance on which might be worth pursuing:

  • One solution would be for us to build our framework using the same versions of the libraries as the Triton Server build of TensorFlow did. However, the Triton Server for JetPack tarball does not contain headers, libraries, or build system integrations (e.g. .cmake or pkg-config files) for the associated libraries. So that choice seems ruled out. I did see that in the “py3-sdk” Docker image that there appear to be versions of absl, protobuf, etc. under workspace/install/{lib,include}, so maybe that will be an option once Jetpack compatible Triton Server Docker images are made available? However, those installations looks somewhat incomplete (missing at least some build intergration files under lib/cmake) and are static, not dynamic. In any event, it doesn’t solve the problem now for Triton for Jetpack. Are those libraries provided explicitly to address this problem? Is the non-availability of those libraries in the Jetpack version an oversight? Ditto the missing build system integration support files?

  • It feels to me like this conflict is an implementation detail leak from Triton Server into the application. I would not expect Triton Server to allow backends (or, potentially, itself?) to resolve symbols from the application or the application’s transitive dependencies. It makes me wonder whether Triton Server should internally be using dlmopen to load backends and to put each on its own link-map via LM_ID_NEWLM. If I’ve found the right part of the code, it currently just uses RTLD_LOCAL which will prevent any libraries subsequently loaded by the application from resolving symbols in the backend library, but will not prevent the backend library from resolving symbols against those already loaded on the applications main link map. Should I file an issue on GitHub to suggest dlmopen with LM_ID_NEWLM for loading backends?

  • I could try to have our application itself use dlmopen to load libtritonserver.so with LM_ID_NEWLM and then use dlsym to pull out each TRITONSERVER_ symbol that I need. While inconvenient, I think that would isolate the entire tree of symbols/dependencies transitively rooted at libtritonserver.so from the applications symbols. However, I’m not entirely confident if that isolation further extends to dlopen calls made from within those libraries. If it does, I think this might work, and I plan to try it. This however is not a great experience for using the Triton SDK API, which by all appearances has done a very good job of attempting to create a clean separation between users of the SDK and the backends.

  • I could try to build my own TensorFlow backend, and arrange for the TensorFlow build part of it to use our versions of gRPC and friends. Or vice versa, I could build our framework against TensorFlow’s versions. However, I’m not confident that I’m going to succeed in building such a backend, and, if I’m going to the trouble to build TF myself, it seems like maybe a better choice to switch my application to just using TF directly, rather than using Triton. Doubly so given that it seems that I need to use a patched version of TF in order to have it integrate with Triton. My aim in using Triton was in many ways to avoid needing to manage the build of the inference backends.

  • I could try to build all of TritonServer from source, and hope to get it and its TensorFlow backend to build against the same version of gRPC and friends. This feels quite daunting, even more so than just TensorFlow, and also contrary to what I was hoping to achieve by adopting Triton Server as my inference engine.

Overall, those are the ways I can currently see to move forward from here, but I’m definitely open to other suggestions or thoughts on how I can resolve this issue.

Thanks, with apologies for the (very) long and complex question,
Andrew

Hi,

Back to your question, which custom dependencies you required?
Do you need a gRPC version that is not compatible with Jetson’s Triton Server?

Thanks.

One solution would be for us to build our framework using the same versions of the libraries as the Triton Server build of TensorFlow did.
Triton Server is an open-sourced library.

Yes, I’m aware, but building Triton is a fairly complex endeavor and we hoped to make use of the containers when available. Also, the build instructions suggest that the TensorFlow backend may have a different build procedure, and it is, I believe, that library which exhibits conflicts.

  • Should I file an issue on GitHub to suggest dlmopen with LM_ID_NEWLM for loading backends
    Sure, so you can communicate with the Triton team about the way to load the backend.

Thanks, I filed Triton library does not fully insulate applications from backend symbols · Issue #6221 · triton-inference-server/server · GitHub.

Back to your question, which custom dependencies you required?

Our application requires at least gRPC, abseil, and protobuf, along with a few others.

Do you need a gRPC version that is not compatible with Jetson’s Triton Server?

Effectively, yes. But if Triton Server’s In-Process API is going to require that I share a version of the above libraries with it, then it should include a complete distribution of those libraries so that I can build my software against the same versions that Triton Server uses. This requirement extends to backend libraries that are dlopen’d, like the tensorflow_cc and tensorflow_backend libraries.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.