Symbol resolution conflicts with Triton Server for Jetpack TensorFlow backend (gRPC, protobuf, absl, etc.)

andrew.c.morrow · August 8, 2023, 8:02pm

Hi All -

I’m working with a Jetson Orin Nano devkit running Jetson Linux with Jetpack installed. My goal is to make use of Triton Server on this platform within a C++ application via the In-Process API to run TensorFlow models. Since this is a Jetson platform, I’m using Triton Server from the Jetpack-specific Triton tarball.

However, I’ve run into a tricky symbol resolution issue with this setup that has me totally blocked.

Our application makes use of a framework built on gRPC, Protobuf, and associated dependencies like absl. On Ubuntu 20.04 (Focal) we can’t use the system versions of those libraries since they are too old (or non-existent), so we build our own versions into /usr/local. I expect though that we would encounter the same problems described below if we used the system versions.

The TensorFlow support libraries that come with the Triton Server installation also make use of these libraries, but they are “baked in” to the libtensorflow_cc.so.2 and libtensorflow_framework.so.2 libraries in backends/tensorflow.

Unfortunately, it seems that when Triton Server dlopens the TensorFlow support libraries the versions of the symbols from our framework are being used preferentially to the baked in versions of those symbols contained in the TensorFlow backend shared libraries, leading to not-so-subtle crashes while loading a TF model from the model repository during TRITONSERVER_ServerNew.

In order to test this, I created a simple toy program that linked against libtritonserver.so and did not use our framework, and used LD_DEBUG to explore symbols resolution. In the resulting logs, it is clear that the TensorFlow library is correctly resolving things like its protobuf symbols “internally” when our framework is not present:

19:     binding file /opt/tritonserver/backends/tensorflow/libtensorflow_framework.so.2 [0] to /opt/tritonserver/backends/tensorflow/libtensorflow_framework.so.2 [0]: normal symbol `_ZNK6google8protobuf16RepeatedPtrFieldINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEE4dataEv'
19:     binding file /opt/tritonserver/backends/tensorflow/libtensorflow_cc.so.2 [0] to /opt/tritonserver/backends/tensorflow/libtensorflow_framework.so.2 [0]: normal symbol `_ZNK6google8protobuf16RepeatedPtrFieldINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEE4dataEv'

The toy program running like this is able to load the TF models from the repository.

I then added a link dependency on our framework to the toy program, which pulled in a transitive dependency on the Protobuf shared library we built to /usr/local. After doing so, it is clear from the LD_DEBUG output that now the Triton TensorFlow backend is incorrectly resolving its internal symbols against the Protobuf library from /usr/local:

20:	binding file /opt/tritonserver/backends/tensorflow/libtensorflow_cc.so.2 [0] to /usr/local/lib/libprotobuf.so.32 [0]: normal symbol `_ZNK6google8protobuf16RepeatedPtrFieldINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEE4dataEv'

This run, where the only difference is the presence (not even invocation) of our framework library and its transitive dependencies, crashes badly during model loading.

This is more or less doomed to fail since there is no assurance that the version of Protobuf (or gRPC, or absl, etc.) that our framework uses (whether that is system or custom) matches what the TritonServer tensorflow backend libraries were built with.

From here I have several questions and potential paths forward, and I could use some guidance on which might be worth pursuing:

One solution would be for us to build our framework using the same versions of the libraries as the Triton Server build of TensorFlow did. However, the Triton Server for JetPack tarball does not contain headers, libraries, or build system integrations (e.g. .cmake or pkg-config files) for the associated libraries. So that choice seems ruled out. I did see that in the “py3-sdk” Docker image that there appear to be versions of absl, protobuf, etc. under workspace/install/{lib,include}, so maybe that will be an option once Jetpack compatible Triton Server Docker images are made available? However, those installations looks somewhat incomplete (missing at least some build intergration files under lib/cmake) and are static, not dynamic. In any event, it doesn’t solve the problem now for Triton for Jetpack. Are those libraries provided explicitly to address this problem? Is the non-availability of those libraries in the Jetpack version an oversight? Ditto the missing build system integration support files?
It feels to me like this conflict is an implementation detail leak from Triton Server into the application. I would not expect Triton Server to allow backends (or, potentially, itself?) to resolve symbols from the application or the application’s transitive dependencies. It makes me wonder whether Triton Server should internally be using dlmopen to load backends and to put each on its own link-map via LM_ID_NEWLM. If I’ve found the right part of the code, it currently just uses RTLD_LOCAL which will prevent any libraries subsequently loaded by the application from resolving symbols in the backend library, but will not prevent the backend library from resolving symbols against those already loaded on the applications main link map. Should I file an issue on GitHub to suggest dlmopen with LM_ID_NEWLM for loading backends?
I could try to have our application itself use dlmopen to load libtritonserver.so with LM_ID_NEWLM and then use dlsym to pull out each TRITONSERVER_ symbol that I need. While inconvenient, I think that would isolate the entire tree of symbols/dependencies transitively rooted at libtritonserver.so from the applications symbols. However, I’m not entirely confident if that isolation further extends to dlopen calls made from within those libraries. If it does, I think this might work, and I plan to try it. This however is not a great experience for using the Triton SDK API, which by all appearances has done a very good job of attempting to create a clean separation between users of the SDK and the backends.
I could try to build my own TensorFlow backend, and arrange for the TensorFlow build part of it to use our versions of gRPC and friends. Or vice versa, I could build our framework against TensorFlow’s versions. However, I’m not confident that I’m going to succeed in building such a backend, and, if I’m going to the trouble to build TF myself, it seems like maybe a better choice to switch my application to just using TF directly, rather than using Triton. Doubly so given that it seems that I need to use a patched version of TF in order to have it integrate with Triton. My aim in using Triton was in many ways to avoid needing to manage the build of the inference backends.
I could try to build all of TritonServer from source, and hope to get it and its TensorFlow backend to build against the same version of gRPC and friends. This feels quite daunting, even more so than just TensorFlow, and also contrary to what I was hoping to achieve by adopting Triton Server as my inference engine.

Overall, those are the ways I can currently see to move forward from here, but I’m definitely open to other suggestions or thoughts on how I can resolve this issue.

Thanks, with apologies for the (very) long and complex question,
Andrew

AastaLLL · August 9, 2023, 7:51am

Hi,

One solution would be for us to build our framework using the same versions of the libraries as the Triton Server build of TensorFlow did.
Triton Server is an open-sourced library.
You can find the header in the below link directly:
GitHub - triton-inference-server/server: The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Should I file an issue on GitHub to suggest dlmopen with LM_ID_NEWLM for loading backends
Sure, so you can communicate with the Triton team about the way to load the backend.

Back to your question, which custom dependencies you required?
Do you need a gRPC version that is not compatible with Jetson’s Triton Server?

Thanks.

andrew.c.morrow · August 22, 2023, 2:05pm

One solution would be for us to build our framework using the same versions of the libraries as the Triton Server build of TensorFlow did.
Triton Server is an open-sourced library.

Yes, I’m aware, but building Triton is a fairly complex endeavor and we hoped to make use of the containers when available. Also, the build instructions suggest that the TensorFlow backend may have a different build procedure, and it is, I believe, that library which exhibits conflicts.

Should I file an issue on GitHub to suggest dlmopen with LM_ID_NEWLM for loading backends
Sure, so you can communicate with the Triton team about the way to load the backend.

Thanks, I filed Triton library does not fully insulate applications from backend symbols · Issue #6221 · triton-inference-server/server · GitHub.

Back to your question, which custom dependencies you required?

Our application requires at least gRPC, abseil, and protobuf, along with a few others.

Do you need a gRPC version that is not compatible with Jetson’s Triton Server?

Effectively, yes. But if Triton Server’s In-Process API is going to require that I share a version of the above libraries with it, then it should include a complete distribution of those libraries so that I can build my software against the same versions that Triton Server uses. This requirement extends to backend libraries that are dlopen’d, like the tensorflow_cc and tensorflow_backend libraries.

system · September 11, 2023, 7:47am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Building custom TensorFlow ops DeepStream SDK	8	461	March 6, 2023
Seeking Guidance on Building DeepStream Image with Triton Inference Server and TensorFlow GPU Support DeepStream SDK	7	439	March 23, 2024
Triton Server Crashing Running Centerpoint Keypoint (hourglass_512x512_kpts) on Jetson via Dockerized Triton Jetson TX2 jetson-inference , docker , inference-server-triton	6	1169	February 9, 2022
Tensorflow and Protobuf.... Jetson TX2	23	8828	October 18, 2021
Installing Triton Server on Lenovo SE70 with Xavier NX Jetson Xavier NX inference-server-triton	20	994	April 22, 2024
Deepstream - Use standalone Triton server? DeepStream SDK	10	1331	October 12, 2021
GRPC Data Corruption/Issue with Yolo Object Detection with Triton on Jetson DeepStream SDK	20	676	June 25, 2024
Is Tensorflow 2.0 on Jetson TX2 supported? Jetson TX2	19	4499	October 18, 2021
TensorFlow for Jetson TX2! Jetson TX2	113	47629	September 21, 2023
Deepstream and Triton containers DeepStream SDK deepstream	5	27	September 30, 2024

Symbol resolution conflicts with Triton Server for Jetpack TensorFlow backend (gRPC, protobuf, absl, etc.)

Related topics