Questions about GH200 Superchip

We are looking at the GH200 Superchip and we are having three questions:

  1. Can you train any model architecture (LLMs, Latent diffusion model, etc.) in the same way as the H100, just with more available memory?

  2. Is the architecture scalable, say if we buy 1x GH200 now, and some months down the line we buy another to interconnect?

  3. Is it possible to test-drive 1x GH200 before buying?