Notes on DGX Spark Bundle QSFP Connection
I thought it would be helpful for others to access my technical notes as I ran into some issues with the official documentation.
What is provided in the official documentation is just about connecting two Sparks using a QSFP cable. If you do so, it does not mean that your AI models will use both Sparks at the same time. That part is the harder step.
If you are lazy, like me, and are looking for a ready-to-go and minimal document to connect your sparks, here is the guide:
Step 1: connect the two Sparks using their first QSFP port (the one closer to the power button) and ensure you have the same username on both of them. Here I assume your username is USERNAME. Update it as needed.
Step 2: set the ip addresses:
on DGX1:
sudo ip addr add 192.168.100.11/24 dev enp1s0f0np0
sudo ip link set enp1s0f0np0 up
on DGX2
sudo ip addr add 192.168.100.12/24 dev enp1s0f0np0
sudo ip link set enp1s0f0np0 up
If you want to double check:
ip addr show enp1s0f0np0
Step3: One each Spark do this:
ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub USERNAME@192.168.100.11
ssh-copy-id -i ~/.ssh/id_rsa.pub USERNAME@192.168.100.12
Step4: Verify this works on each Spark without any password request or problem:
ssh 192.168.100.11 hostname
ssh 192.168.100.12 hostname
Note: I chose 192.168.100.11 and 12 arbitrarily. You can change the IPs and /24 if you know the basics. Just ensure you have a separate network for your SFP network. For example, if you have connected your Sparks to your wifi network and your wifi net addresses are like 192.168.100.X, then you need to use another IP network for your SFPs (e.g., 192.168.200.11 & 12 /24 )