Have I written custom code : No
OS Platform and Distribution: CentOS 7
TensorFlow installed from: source
TensorFlow version: tensorflow-serving branch r1.7
Bazel version: 0.11.1
CUDA/cuDNN version: CUDA9.0, cuDNN 7.0.5, TensorRT4.0.4 (actually)
I tried to compile the Tensorflow-serving r1.7 with TensorRT 4.0.4, and the compilation is successfully done.
At global scope:
cc1plus: warning: unrecognized command line option '-Wno-self-assign'
INFO: Elapsed time: 1452.421s, Critical Path: 479.68s
INFO: Build completed successfully, 11375 total actions
But when I start the service and load a TFTRT optimized model, I get error:
2018-06-07 17:41:40.910874: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:242] Loading SavedModel with tags: { serve }; from: /media/disk1/fordata/web_server/project/LdaBasedClassification_623_1.7/data/cate155_tftrt_frozen/1
2018-06-07 17:41:41.030117: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-06-07 17:41:41.283451: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:84:00.0
totalMemory: 11.90GiB freeMemory: 11.74GiB
2018-06-07 17:41:41.283514: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-06-07 17:41:41.601178: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-07 17:41:41.601253: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-06-07 17:41:41.601273: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-06-07 17:41:41.601561: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10970 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:84:00.0, compute capability: 6.1)
2018-06-07 17:41:41.878689: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:291] SavedModel load for tags { serve }; Status: fail. Took 967809 microseconds.
2018-06-07 17:41:41.878771: E tensorflow_serving/util/retrier.cc:38] Loading servable: {name: inception_v3 version: 1} failed: Not found: Op type not registered 'TRTEngineOp' in binary running on bjpg-g180.yz02. Make sure the Op and Kernel are registered in the binary running in this process.
Looks like the TRTEngineOp is still not supported by this execution file.
Though I’m not 100% sure about my way of compiling Tensorflow-serving 1.7 with TRT, but I think the compilation indeed searched and found the libnvinfer.so, etc, and also checked the TensorRT version is correct. So I don’t know why the binary executive file still can’t support TRTEngineOp.
Here is my environment variables:
export TENSORRT_INSTALL_PATH="/home/karafuto/TensorRT-3.0.4/lib"
export TENSORRT_LIB_PATH="/home/karafuto/TensorRT-3.0.4/lib"
export TF_TENSORRT_VERSION=4.0.4
This is my compilation command:
sed -i.bak 's/@org_tensorflow\/\/third_party\/gpus\/crosstool/@local_config_cuda\/\/crosstool:toolchain/g' tools/bazel.rc
bazel build --config=cuda --action_env PYTHON_BIN_PATH="/home/karafuto/dlpy72/dlpy/bin/python2.7" TENSORRT_BIN_PATH="/home/karafuto/TensorRT-3.0.4" -c opt tensorflow_serving/...
I’m not sure whether my procedure is correct. Really few of docs can be found that talk about how to build the tensorflow-serving 1.7 with tensorrt. Can any clue member help me?
Thanks
PS: The tensorrt source code is downloaded from NVIDIA official website, which tar file is named “TensorRT-3.0.4.Ubuntu-14.04.5.x86_64.cuda-9.0.cudnn7.0.tar.gz”. The weird thing is, I unpacked the tar file and find the actually version is 4.0.4 not 3.0.4. So in the tensorflow-serving-r1.7, I need to set the variable TF_TENSORRT_VERSION=4.0.4 to avoid version check failure.
I encountered below 2 errors and solved them, so I think the bazel compilation shall indeed compiled the TensorRT. Post here as an evidence.
This is the error when I set wrong TENSORRT_LIB_PATH, (can’t find libnvinfer.so):
ERROR: error loading package 'tensorflow_serving/apis': Encountered error while reading extension file 'build_defs.bzl': no such package '@local_config_tensorrt//': Traceback (most recent call last):
File "/home/web_server/.cache/bazel/_bazel_web_server/7039d45003118564d66f2b06f1b7ea68/external/org_tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 160
auto_configure_fail("TensorRT library (libnvinfer) v...")
File "/home/web_server/.cache/bazel/_bazel_web_server/7039d45003118564d66f2b06f1b7ea68/external/org_tensorflow/third_party/gpus/cuda_configure.bzl", line 210, in auto_configure_fail
fail(("\n%sCuda Configuration Error:%...)))
Cuda Configuration Error: TensorRT library (libnvinfer) version is not set.
This is when the TF_TENSORRT_VERSION is not the same with found libnvinfer:
ERROR: error loading package 'tensorflow_serving/apis': Encountered error while reading extension file 'build_defs.bzl': no such package '@local_config_tensorrt//': Traceback (most recent call last):
File "/home/web_server/.cache/bazel/_bazel_web_server/7039d45003118564d66f2b06f1b7ea68/external/org_tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 167
_trt_lib_version(repository_ctx, trt_install_path)
File "/home/web_server/.cache/bazel/_bazel_web_server/7039d45003118564d66f2b06f1b7ea68/external/org_tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 87, in _trt_lib_version
auto_configure_fail(("TensorRT library version detec...)))
File "/home/web_server/.cache/bazel/_bazel_web_server/7039d45003118564d66f2b06f1b7ea68/external/org_tensorflow/third_party/gpus/cuda_configure.bzl", line 210, in auto_configure_fail
fail(("\n%sCuda Configuration Error:%...)))
Cuda Configuration Error: TensorRT library version detected from /media/disk1/fordata/web_server/project/xiaolun/TensorRT-3.0.4/include/NvInfer.h (4.0.4) does not match TF_TENSORRT_VERSION (3.0.4). To fix this rerun configure again.