Is it possible to run multiple LLM instances in parallel using multithreading to handle multiple queries simultaneously on Jetson Orin AGX?

I’m exploring the possibility of running multiple LLM (Large Language Model) instances in parallel on the Jetson Orin AGX using multithreading or multiprocessing. The goal is to handle multiple queries simultaneously to improve performance and responsiveness in real-time applications. I’d like to know if this is feasible given the Orin AGX’s GPU and CPU architecture, and what would be the recommended approach — whether using threading, multiprocessing, or containerization. Any insights on resource allocation, performance optimization, or example implementations would be highly appreciated.