We are developing a large-scale parallel simulation system for construction machinery such as excavators or cranses using ISAAC Sim and Ray. Our objective is to compress a 4-hour real-world construction cycle into less than 1.5 hours of simulation time.
We prepare to run our work on one NVIDIA A100 GPU(40GB) as the base compute node, and 64 environment instances on this A100. Distributed management via Ray framework, targeting 1,000+ total concurrent instances across a GPU cluster.
And I would like to seek advice on the following:
- VRAM & Density: Is 64 instances per A100 realistic for “construction-level” complexity (high-poly machinery + deformable terrain/particles)? What is the typical VRAM footprint for such a setup in Isaac Sim?
- CPU Bottleneck: Given 64 instances on one GPU, how many CPU cores are recommended to avoid becoming the bottleneck for Ray’s task scheduling and step synchronization?
- Real-time Factor: To achieve a 2.6x speedup (4h task in 1.5h), is it more efficient to use “Multi-Process” (multiple Isaac Sim apps) or “Vectorized Environments” (single process, multiple clones via Tensor API)?
Thank you
I can tell you straight away that the A100 is not the right GPU at all. It will not work that way. The A100 is for Data Centers. You need a heavy GPU RTX card. See below. Ideally an A6000 or an L40s.
For Omniverse, A100 is technically usable for compute, but Omniverse’s real‑time graphics stack is optimized and officially recommended around RTX/graphics GPUs like the RTX 6000 Ada (and successors), while the H/B “datacenter” series (A100/H100/H200/B100/B200) are aimed at large‑scale AI/compute rather than interactive viewport work. forums.developer.nvidia
How Omniverse views these GPU families
- Omniverse requires RTX‑class ray‑tracing hardware; NVIDIA explicitly recommends RTX‑branded GPUs (e.g., RTX 6000 Ada) for Omniverse workloads, not pure datacenter accelerators like A100. docs.omniverse.nvidia
- Datacenter accelerators (A/H/B series) can run CUDA/AI workloads that you drive from Omniverse (e.g., training/inference servers), but they are not the primary recommendation for running the Omniverse client/renderer itself. forums.developer.nvidia
High‑level comparison
| Family / Card |
Primary use |
Memory (type / size) |
Strengths for Omniverse‑style work |
Main drawbacks for Omniverse UX |
| A100 (Ampere) |
Datacenter AI / HPC training & inference |
40–80 GB HBM2e, up to ~2 TB/s bandwidth |
Huge tensor throughput, great for backend AI services |
Not an RTX workstation GPU; Omniverse team note A100 “cannot run Omniverse” as a normal graphics system. forums.developer.nvidia |
| RTX 6000 Ada (Ada) |
Workstation RTX graphics + AI |
48 GB GDDR6 ECC, ~1 TB/s class bandwidth |
Designed and recommended for Omniverse; strong RT cores, display, and large VRAM for scenes. docs.omniverse.nvidia |
Less raw tensor FP8/FP4 throughput than newer H/B series for giant LLMs. bizon-tech |
| H‑series (H100/H200) |
Datacenter AI (LLMs, generative AI) & HPC |
80 GB HBM3 (H100) to 141 GB HBM3e (H200) bizon-tech |
Massive AI throughput (FP8/FP16), great as compute backends for Omniverse‑driven AI pipelines. bizon-tech |
Datacenter form factor, no focus on display/output; overkill and awkward as an Omniverse “desktop” GPU. |
| B‑series (B100/B200) |
Next‑gen Blackwell AI accelerators |
~192 GB HBM3e, up to 8 TB/s bandwidth. northflank |
Top‑end LLM / simulation compute; ideal as remote/cluster compute targeted by Omniverse clients. northflank |
Very high power, datacenter only, not a practical Omniverse viewport card. |
A100 vs RTX 6000 Ada in practice
- A100 is built around GA100 compute silicon with HBM2e and focuses on tensor/FP64 performance for training and HPC; RTX A6000/RTX 6000 Ada use graphics‑oriented chips with strong RT cores and display outputs. server-parts
- RTX 6000 Ada is explicitly positioned as the ideal Omniverse/VR/visualization GPU with 48 GB GDDR6 and full RTX graphics stack; that’s why Omniverse workstation requirements list it, not A100. nvidia
- For scenes, path tracing, USD workflows, and interactive viewports, RTX 6000 Ada will generally give a better Omniverse user experience than A100 at similar cost, even though A100 may win in raw tensor FLOPs. bizon-tech
H and B series vs A100 for Omniverse‑adjacent workloads
- H100/H200 substantially increase AI throughput and memory bandwidth over A100, making them better choices if your Omniverse deployment talks to large LLM/simulation backends (e.g., via microservices) rather than doing everything on the viewport GPU. bizon-tech
- Blackwell B100/B200 further push AI performance and memory, supporting very large context windows and batch sizes, but stay in the same “backend accelerator” role—excellent for servers Omniverse connects to, not as the GPU driving the Omniverse UI. northflank
How to choose for an Omniverse deployment
- For artist/engineer workstations or streaming servers primarily running Omniverse render/viewport: prioritize RTX 6000 Ada (or multiple RTX‑class cards), since that aligns with NVIDIA’s own Omniverse requirements. nvidia
- For large‑scale AI or physics that Omniverse orchestrates (e.g., sim farm, LLM agents): place H100/H200/B100/B200 in backend nodes and keep RTX 6000 Ada (or similar) on the front‑end or streaming nodes. nvidia
- A100 now sits in an in‑between spot: solid as an older backend accelerator, but not recommended as the primary GPU for running Omniverse itself, especially compared with RTX 6000 Ada and newer H/B generations. server-parts
What’s your main Omniverse use case: interactive design/visualization for users, or mostly driving heavy AI/simulation workloads in the background?
Thanks for your reply, our work mainly focuses on driving heavy AI/simulation workloads in the background instead of interactive design/visualization for users. Our research is based on machine learning or deep learning to train agent, so what we want to achieve is to make agent achieve target goals rather than interact with users.
Omniverse requires an RTX chipset. The A100 is not capable. It has to be a H series, or B series or an L40s or A6000.
Omniverse is not straight machine learning, in the sense of just crunching the math. Omniverse is a special case where it requires to do visual computing with the RTX chipset.
Here is a table of the exact GPUs that are required. As you can see, the A100 is data center only and will not work with Omniverse.
| Architecture / Series |
Example GPUs |
Omniverse RTX renderer support |
Notes |
| Turing RTX (consumer) |
GeForce RTX 2070, 2080 Ti |
Yes (RTX required, min often 2070) omniverse.nvidia+1 |
Works but lower than recommended for heavy scenes. |
| Turing RTX (pro) |
Quadro RTX 4000–8000 |
Yes manual.reallusion |
Common in Omniverse Enterprise deployments. |
| Ampere RTX (consumer) |
GeForce RTX 3070–3090 |
Yes; 3070 is often cited as minimum for ray tracing workloads omniverse.nvidia+1 |
3060 can work but below spec and may struggle.developer.nvidia |
| Ampere RTX (pro) |
RTX A4000–A6000 |
Yes; explicitly supported in Omniverse Enterprise docs engineering |
Strong recommendation for professional workloads. |
| Ada RTX (consumer) |
GeForce RTX 4070–4090 |
Yes (RTX, newer than Turing) omniverse.nvidia+1 |
Very good for real‑time path tracing. |
| Ada RTX (pro / data center) |
L40, L40S |
Yes; listed for Omniverse services/inference, L40/L40S support RT cores omniverse.nvidia+1 |
Common in server‑side Omniverse setups. |
| Blackwell RTX |
RTX 5090, RTX 6000 Blackwell |
Yes in current RTX feature matrices; recommended in recent Omniverse workstation guidance omniverse.nvidia+1 |
Newest generation, high‑end. |
| Non‑RTX older (Tesla, Fermi, Kepler, Maxwell, Pascal, Volta) |
e.g., GTX 1080 Ti, Tesla V100 |
No support for Omniverse RTX Renderer omniverse.nvidia+1 |
You can use some SDK pieces without RTX, but renderer is unsupported. |
| A100 |
A100 |
Not supported for Omniverse rendering developer.nvidia |
Datacenter compute GPU without RTX focus; forum staff say it “cannot run Omniverse.”developer.nvidia |
Thanks for your reply. Given our current hardware configuration centered around A100s for large-scale AI training, we have decided to pivot our strategy. We will focus our efforts on multi-agent RL (MARL) simulations using the ‘–headless’ mode. This allows us to bypass the GPU-intensive rendering requirements while leveraging the A100’s massive parallel compute power for physics-based simulation and task execution. We plan to integrate visualization-heavy workflows (via RTX-enabled GPUs like A6000) only in the later stages if required for model validation or debugging.