I’m looking to setup a DGX-like system for GDS. It’s going to have 4 H100 and 4 nvme drives. As in the GDS documentation it’ll be two PCI buses each with 2 drives and 2 GPUs. Based on some documentation (NVIDIA GPUDirect Storage Benchmarking and Configuration Guide - NVIDIA Docs) it looks like putting 2 drives in a RAID0 is reasonable. What I’m curious about is that it looks like it is a software RAID which seems like it would have to involve a CPU and whether that has performance implications vs. making each drive a separate mount.
It looks like ext4 is reasonable for the nvme drives (NVIDIA GPUDirect Storage Installation and Troubleshooting Guide - NVIDIA Docs) but I’m wondering, since this is for DMA between disk and GPU, whether there are any non-typical setup parameters that should be considered.
Thanks.