Hello,
How can I configure to load multiple models using TF-TRT (the models inference sequentially in a single thread)?
The models are optimized with TF-TRT and I use frozen graph to load model to inference but I cannot load more than one graph when executing. The error is:
tensorflow.python.framework.errors_impl.UnavailableError: Can’t provision more than one single cluster at a time
I am using Jetson TX2, Jetpack 4.3, Tensorflow 1.15.
Thank you.
I would suggest you to look into Triton Inference Server. It should have pretty flexible capabilities for loading several models.
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Or you can post question on tf-trt github page: GitHub · Where software is built
Similar issue:
opened 09:36AM - 29 Sep 19 UTC
my code:
```
FP32_SAVED_MODEL_DIR = SAVED_MODEL_DIR+"_TFTRT_FP32/1"
!rm -rf $… FP32_SAVED_MODEL_DIR
#Now we create the TFTRT FP32 engine
trt.create_inference_graph(
input_graph_def=None,
outputs=None,
max_batch_size=1,
input_saved_model_dir=SAVED_MODEL_DIR,
output_saved_model_dir=FP32_SAVED_MODEL_DIR,
precision_mode="FP32")
benchmark_saved_model(FP32_SAVED_MODEL_DIR, BATCH_SIZE=1)
```
and i have set:
`import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"`
when i run ,i got an error:
`InvalidArgumentError: Failed to import metagraph, check error log for more info`
and then i add a code:
`tf.keras.backend.set_learning_phase(0)`
the error is gone ,but one error rasie:
`UnavailableError: Can't provision more than one single cluster at a time`
emmm....... i just use one GPU,which is RTX 2080ti
cuda: Cuda compilation tools, release 10.0, V10.0.130
SOMEONE HELP ME, PLEASE!
Thanks
Thank you.
I have solved the problem by create multiple threads and each thread has one session with one graph.