I am trying to run inference on two different models concurrently using TensorRT.
I have serialized both models as .engine files using ofstream, as several forum posts have shown, then attempted to deserialize them and run inference on them concurrently. I am running into two errors when attempting to do this:
ERROR: Cannot deserialize plugin FancyActivation
ERROR: getPluginCreator could not find plugin FancyActivation version 001 namespace
concurrencyTest: cuda/caskConvolutionLayer.cpp:153: virtual void nvinfer1::rt::task::caskConvolutionLayer::allocateResources(const nvinfer1::rt::CommonContext&): Assertion `configIsValid(context)’ failed.
How would you advise running inference on two models concurrently?
I am running this on an NVIDIA Jetson Xavier with TensorRT version 5.0.6. The code is written using the C++ TensorRT API.