Best System for both Tensor and RT based accelerated computations


Please excuse any NOOB details here…

I’m looking to spec out a Machine Learning GPU server. Desirements include at least 192 GB of GPU memory (ideally more) with gold standard tensor cores and RT cores in the same system. Basically, I need to rendered virtual camera sensors during the learning process being simulated in Isaac Sim when training my agents with Isaac Gym.

I can get Tensor, RT, and CUDA cores in the RTX A6000 board but my understanding is that the maximum number that I can connect with NVLink is two. So, I’m limited to a 96GB system. Maybe I’m wrong here?

Ideally, I’d like the new H100 GPU (gold standard for Tensor cores) linked with the best RT and CUDA core enabled GPU (I think that’s the A40). However, it’s unclear to me if you can connect the H100 (or the A100) with the A40 with NVLink.

Surely I’m not the first person wanting to configure a system with more than 96GB that has the latest state-of-the-art in Tensor, RT, and CUDA core technology.

What am I missing?

Thanks in advance…

NVLink can only connect identical cards, so you can’t bridge an A100 and A40 (for example).

For DGX, we have NVIDIA DGX Station A100 | NVIDIA although there’s no RT cores in there (the GPU is meant for basic desktop and visualization work, not fancy stuff).

I’d recommend checking out NVIDIA RTX and Quadro Workstations for Data Science (near the bottom) for some NVIDIA partners that offer systems like you’re wanting.

Thanks Scott

Scott (and anyone else)…

Let’s approach this from a different angle…

Given the new GRACE and HOPPER technology that is coming out… what would be an ideal server configuration that maximizes compute performance for many parallel virtual environments concurrently running on Isaac Gym where virtual camera sensors rendered from Isaac Sim are used as state/observation input to the reinforcement learning based policy being learned with Isaac Gym?

Discussions with some NVIDIA hardware partners have suggested an multiple H100s with multiple A40s all connected through PCIe4 or PCIe5. Is this really the best hardware architecture for the use case described above?

Are there current Isaac Sim/Gym users out there that can either confirm that this is a good server configuration or suggest a better configuration?