run multiple models at one time on xavier.


I want to know if it is possible to run multiple models at one time on xavier.

For example, I have two trained models: t1.pb and t2.pb.I want to load these two models at the same time in a run: t1.pb running with GPU and t2.pb running with XLA, both of which are parallel.

If you have relevant information in this regard, can you tell me, thank you.

I seem to be able to specify the device that the model runs on( with tf.device(’/gpu:0’) ), but what should I do to make the two models run simultaneously, instead of t1.pb running on GPU and then t2.pb running on XLA serially?

device infomation:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2

and, how to run with xla_gpu?

with tf.device('device:XLA_GPU:0'):

doesn’t work


It’s workable but you may need to convert the model into TensorRT first.

TensorFlow by default occupies all the resources, including GPU and memory, which make the parallelism almost impossible.
You can give it a try but may meet some starvation or even unknown fail.

TensorRT is a much better solution.
We also have a high level library on the top of TensorRT called DeepStream:


i need sample code in this solution for disgusting memory between engines into tensorrt.
now my code is OK when use a engine so that for multiple engines take the memory crash.
Many Thanks.