Can we cluster two Thors together for distributed LLM inference or model training?

I have connected two Thors using a QSFP28 cable, but I’m not sure if Thor supports joint training or inference. Could you provide some guidance on how to set this up?

I would think you could integrate this method from dgx spark to Thor.

https://build.nvidia.com/spark/connect-two-sparks

Wow, cool! Thanks! Maybe a little bit hard for Thor, I guess. Because Sparks 200GbE QSFP but Thor only supports 25G now.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.