Connect two sparks question

I purchased a 200G QSFP56 Passive DAC cable from NODDOD to connect two sparks (one dgx spark, one gigabyte autom) can any one verify the following connection result, I don’t get is: on spark1, it discovers two ip, but on spark2 only 1 ip discovered. is it normal? also from the manual it instructs to not use enP2p1, but it seems I have to use it, is it ok? The IP I should use is 192.168.200.16 (spark1) and 192.168.200.15 (spark2). can any one confirm?

./discover-sparks
Found: 192.168.200.15 (spark2.local)
Found: 192.168.200.12 (spark1.local)
Found: 192.168.200.15 (spark2.local)
Found: 192.168.200.16 (spark1.local)

Setting up shared SSH access across all nodes…
You may be prompted for your password on each node.
Configuring 192.168.200.12…
✓ Successfully configured 192.168.200.12 with shared key
Configuring 192.168.200.15…
✓ Successfully configured 192.168.200.15 with shared key
Configuring 192.168.200.16…
✓ Successfully configured 192.168.200.16 with shared key

Shared SSH setup complete!
All nodes can now SSH to each other using the shared key (id_ed25519_shared).

spark1:
ibdev2netdev
rocep1s0f0 port 1 ==> enp1s0f0np0 (Down)
rocep1s0f1 port 1 ==> enp1s0f1np1 (Up)
roceP2p1s0f0 port 1 ==> enP2p1s0f0np0 (Down)
roceP2p1s0f1 port 1 ==> enP2p1s0f1np1 (Up)

ip neigh show | grep 192.168.200
192.168.200.15 dev enP2p1s0f1np1 lladdr 48:21:0b:96:04:cf STALE
192.168.200.11 dev enP2p1s0f1np1 lladdr 48:21:0b:96:04:cb STALE

spark2:
ibdev2netdev
rocep1s0f0 port 1 ==> enp1s0f0np0 (Up)
rocep1s0f1 port 1 ==> enp1s0f1np1 (Down)
roceP2p1s0f0 port 1 ==> enP2p1s0f0np0 (Up)
roceP2p1s0f1 port 1 ==> enP2p1s0f1np1 (Down)

ip neigh show | grep 192.168.200
192.168.200.16 dev enP2p1s0f0np0 lladdr 4c:bb:47:2d:1a:1b STALE
192.168.200.16 dev enp1s0f0np0 lladdr 4c:bb:47:2d:1a:1b STALE

Looks like you are using IPs from the same subnet on both “twin” interfaces, which causes a lot of issues.
Please use our Community Networking Guide to set it up properly.

With Sparkrun it’s very simple to configure a cluster (stack, mesh or switch)

sparkrun cluster create mylab \
  --hosts 192.168.200.16,192.168.200.15 \
  --user dgxuser \
  -d "2-node DGX Spark cluster"

sparkrun cluster set-default mylab
sparkrun setup ssh
sparkrun setup cx7

sparkrun now has a wizard too! (I need to update those docs, they were written before the wizard).

sparkrun setup wizard – it’ll help you to create cluster configuration or walk you through the setup steps if you’ve already created it with sparkrun cluster create.

Thank you. I will check it. I’ve clean up in both 40-cx7.yaml. now only 16 to 15. I still get some patching script error, see below. any ideas?


:~/code/spark-vllm-docker$ ./run-recipe.sh qwen3.5-397b-int4-autoround.yaml --no-ray
Recipe: Qwen3.5-397B-INT4-Autoround
EXPERIMENTAL recipe for Qwen3.5-397B-INT4-Autoround (please refer to README for details! Use with --no-ray parameter!)

Using cluster nodes from .env: 192.168.200.15, 192.168.200.16

=== Launching ===
Container: vllm-node-tf5
Mods: mods/fix-qwen3.5-autoround, mods/fix-qwen3.5-chat-template, mods/gpu-mem-util-gb
Cluster: 2 nodes

Using launch script: /tmp/tmpsgkrrvct.sh
Detected Local IP: 192.168.200.16 (192.168.200.16/24)
Head Node: 192.168.200.16
Worker Nodes: 192.168.200.15
Container Name: vllm_node
Image Name: vllm-node-tf5
Action: exec
Checking SSH connectivity to worker nodes…
SSH to 192.168.200.15: OK
Starting Head Node on 192.168.200.16…
87c27b9ab3564f1b1beb11c1043d755392d739c3a3a69787cfa3ffbf23138f4d
Starting Worker Node on 192.168.200.15…
fbbe237b9af8d1aa045c189b1e767945915494065cafabe7d008af094b625408
Applying modifications to cluster nodes…
Applying mod ‘fix-qwen3.5-autoround’ to 192.168.200.16…
Copying directory content to container…
Successfully copied 4.1kB to vllm_node:/workspace/mods/fix-qwen3.5-autoround/
Running patch script on 192.168.200.16…
patching file transformers/modeling_rope_utils.py
Hunk #1 succeeded at 648 with fuzz 2.
Applying mod ‘fix-qwen3.5-chat-template’ to 192.168.200.16…
Copying directory content to container…
Successfully copied 11.3kB to vllm_node:/workspace/mods/fix-qwen3.5-chat-template/
Running patch script on 192.168.200.16…
=======> to apply chat template, use --chat-template unsloth.jinja
Applying mod ‘gpu-mem-util-gb’ to 192.168.200.16…
Copying directory content to container…
Successfully copied 16.9kB to vllm_node:/workspace/mods/gpu-mem-util-gb/
Running patch script on 192.168.200.16…
patching file vllm/config/cache.py
Hunk #1 FAILED at 45.
Hunk #2 succeeded at 216 with fuzz 2 (offset 12 lines).
1 out of 2 hunks FAILED – saving rejects to file vllm/config/cache.py.rej
patching file vllm/engine/arg_utils.py
Hunk #1 succeeded at 453 (offset -1 lines).
Hunk #2 succeeded at 946 (offset -9 lines).
Hunk #3 succeeded at 1508 with fuzz 2 (offset -8 lines).
patching file vllm/entrypoints/llm.py
Hunk #2 FAILED at 239.
Hunk #3 succeeded at 357 with fuzz 2 (offset -4 lines).
1 out of 3 hunks FAILED – saving rejects to file vllm/entrypoints/llm.py.rej
patching file vllm/v1/core/kv_cache_utils.py
patching file vllm/v1/utils.py
patching file vllm/v1/worker/gpu_model_runner.py
Hunk #1 succeeded at 5129 (offset -226 lines).
Hunk #2 succeeded at 5208 (offset -226 lines).
patching file vllm/v1/worker/gpu_worker.py
Hunk #1 succeeded at 359 (offset 2 lines).
Hunk #2 succeeded at 372 (offset 2 lines).
patching file vllm/v1/worker/utils.py
Error: Patch script failed on 192.168.200.16

Stopping cluster…
Stopping head node (192.168.200.16)…
Stopping worker node (192.168.200.15)…
Cluster stopped.

another one:


./run-recipe.sh qwen3.5-122b-int4-autoround
Recipe: Qwen3.5-122B-INT4-Autoround
vLLM serving Qwen3.5-122B-INT4-Autoround

Using cluster nodes from .env: 192.168.200.15, 192.168.200.16

=== Launching ===
Container: vllm-node-tf5
Mods: mods/fix-qwen3.5-autoround, mods/fix-qwen3.5-chat-template
Cluster: 2 nodes

Using launch script: /tmp/tmpv0bmx7po.sh
Detected Local IP: 192.168.200.16 (192.168.200.16/24)
Head Node: 192.168.200.16
Worker Nodes: 192.168.200.15
Container Name: vllm_node
Image Name: vllm-node-tf5
Action: exec
Checking SSH connectivity to worker nodes…
SSH to 192.168.200.15: OK
Starting Head Node on 192.168.200.16…
712d8515aace0fa6fbd6a030b1bc7aa8ca9eb05967ac67feac7c715032838641
Starting Worker Node on 192.168.200.15…
b40512ef6ee5467a575d4ac3dfe6410524ba7fcd521d5642f2ab1ab44055851d
Applying modifications to cluster nodes…
Applying mod ‘fix-qwen3.5-autoround’ to 192.168.200.16…
Copying directory content to container…
Successfully copied 4.1kB to vllm_node:/workspace/mods/fix-qwen3.5-autoround/
Running patch script on 192.168.200.16…
patching file transformers/modeling_rope_utils.py
Hunk #1 succeeded at 648 with fuzz 2.
Applying mod ‘fix-qwen3.5-chat-template’ to 192.168.200.16…
Copying directory content to container…
Successfully copied 11.3kB to vllm_node:/workspace/mods/fix-qwen3.5-chat-template/
Running patch script on 192.168.200.16…
=======> to apply chat template, use --chat-template unsloth.jinja
Applying mod ‘fix-qwen3.5-autoround’ to 192.168.200.15…
Copying mod package to 192.168.200.15:/tmp/vllm_mod_pkg_1775064228_22978…
run.sh                                                                                               100%   92   129.6KB/s   00:00
transformers.patch                                                                                   100%  596     1.0MB/s   00:00
Copying directory content to container…
Running patch script on 192.168.200.15…
patching file transformers/modeling_rope_utils.py
Hunk #1 FAILED at 648.
1 out of 1 hunk FAILED – saving rejects to file transformers/modeling_rope_utils.py.rej
Error: Patch script failed on 192.168.200.15

Stopping cluster…
Stopping head node (192.168.200.16)…
Stopping worker node (192.168.200.15)…
Cluster stopped.

I got sparkrun running on qwen3-.17b-vllm

sparkrun status
Job: @sparkrun-transitional/qwen3-1.7b-vllm  (tp=2)  [7e9809c9fc5c]  (2 container(s))
node_0     192.168.177.11                           Up 14 minutes             sparkrun-eugr-vllm-tf5
node_1     192.168.177.12                           Up 14 minutes             sparkrun-eugr-vllm-tf5
logs: sparkrun logs 7e9809c9fc5c
stop: sparkrun stop 7e9809c9fc5c

Total: 2 container(s) across 2 host(s)

The problem is i don’t think it launch correctly.


nvidia-smi
Thu Apr  2 06:22:05 2026
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.142                Driver Version: 580.142        CUDA Version: 13.0     |
±----------------------------------------±-----------------------±---------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GB10                    On  |   0000000F:01:00.0  On |                  N/A |
| N/A   51C    P0             12W /  N/A  | Not Supported          |      1%      Default |
|                                         |                        |                  N/A |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            4071      G   /usr/lib/xorg/Xorg                      135MiB |
|    0   N/A  N/A            4218      G   /usr/bin/gnome-shell                    140MiB |
|    0   N/A  N/A          380635      G   …/8051/usr/lib/firefox/firefox        229MiB |
|    0   N/A  N/A          386026      C   VLLM::Worker                            289MiB |
|    0   N/A  N/A          386585      G   /usr/share/cursor/cursor                 70MiB |
±----------------------------------------------------------------------------------------+

it never full loaded the gpu.

For using spark-vllm-docker, my issue is with run-recipe.sh –discover, I like the auto discover:

Auto-generated by autodiscover.sh

CLUSTER_NODES=192.168.177.11,192.168.177.12
COPY_HOSTS=192.168.177.12
LOCAL_IP=192.168.177.11
ETH_IF=enp1s0f1np1
IB_IF=rocep1s0f1,roceP2p1s0f1

above is head .env, enp1s0f1np1 is not the eth_if for the worker, I think by default the worker is using this and the docker will not work. what should I do? do I need to run the same script at worker to generate .env?

network:
version: 2
ethernets:
enp1s0f1np1:
dhcp4: no
dhcp6: no
link-local:
mtu: 9000
addresses: [192.168.177.11/24]
enP2p1s0f1np1:
dhcp4: no
dhcp6: no
link-local:
mtu: 9000
addresses: [192.168.178.11/24]
ibdev2netdev
rocep1s0f0 port 1 ==> enp1s0f0np0 (Down)
rocep1s0f1 port 1 ==> enp1s0f1np1 (Up)
roceP2p1s0f0 port 1 ==> enP2p1s0f0np0 (Down)
roceP2p1s0f1 port 1 ==> enP2p1s0f1np1 (Up)

spark2:
network:
version: 2
ethernets:
enp1s0f0np0:
dhcp4: no
dhcp6: no
link-local:
mtu: 9000
addresses: [192.168.177.12/24]
enP2p1s0f0np0:
dhcp4: no
dhcp6: no
link-local:
mtu: 9000
addresses: [192.168.178.12/24]
~
spark2:~$ ibdev2netdev
rocep1s0f0 port 1 ==> enp1s0f0np0 (Up)
rocep1s0f1 port 1 ==> enp1s0f1np1 (Down)
roceP2p1s0f0 port 1 ==> enP2p1s0f0np0 (Up)
roceP2p1s0f1 port 1 ==> enP2p1s0f1np1 (Down)

Did you try running:
sparkrun logs 7e9809c9fc5c
?

The status command gives you a reference for the “safe” way to pull up logs or stop the inference job.

You can look at the logs from vllm and see what it was saying – and we can use that to figure out what to do.

I was able to connect two sparks finally, thanks.
I can run Qwen3.5-35b on the cluster with no issue, but I am getting issues with dense models on the cluster, with single node it response good, but cluster it generates tokens but it seems in a loop, the tokens are corrupted, Qwen3.5-2b on the cluster, I don’t know why it use such big vram with such small model?

`Fri Apr 3 15:25:55 2026
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.142 Driver Version: 580.142 CUDA Version: 13.1 |
±----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GB10 On | 0000000F:01:00.0 On | N/A |
| N/A 61C P0 29W / N/A | Not Supported | 88% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1212 C python3 223MiB |
| 0 N/A N/A 1276 C sglang::scheduler_TP0 93513MiB |
±----------------------------------------------------------------------------------------+`

Fri Apr  3 15:26:29 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.142                Driver Version: 580.142        CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GB10                    On  |   0000000F:01:00.0 Off |                  N/A |
| N/A   59C    P0             23W /  N/A  | Not Supported          |     90%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A             131      C   python3                                 170MiB |
|    0   N/A  N/A             230      C   sglang::scheduler_TP1                 93537MiB |
+-----------------------------------------------------------------------------------------+




below is the logs on master:

[2026-04-03 15:32:30 TP0] Decode batch, #running-req: 1, #full token: 26427, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 42.25, #queue-req: 0 [2026-04-03 15:32:31 TP0] Decode batch, #running-req: 1, #full token: 26467, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 43.38, #queue-req: 0 [2026-04-03 15:32:32 TP0] Decode batch, #running-req: 1, #full token: 26507, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 40.46, #queue-req: 0 [2026-04-03 15:32:33 TP0] Decode batch, #running-req: 1, #full token: 26547, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 41.14, #queue-req: 0 [2026-04-03 15:32:34 TP0] Decode batch, #running-req: 1, #full token: 26587, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 40.92, #queue-req: 0 [2026-04-03 15:32:35 TP0] Decode batch, #running-req: 1, #full token: 26627, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 41.96, #queue-req: 0 [2026-04-03 15:32:36 TP0] Decode batch, #running-req: 1, #full token: 26667, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 41.97, #queue-req: 0 [2026-04-03 15:32:36 TP0] Decode batch, #running-req: 1, #full token: 26707, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 41.47, #queue-req: 0 [2026-04-03 15:32:37 TP0] Decode batch, #running-req: 1, #full token: 26747, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 40.66, #queue-req: 0 [2026-04-03 15:32:38 TP0] Decode batch, #running-req: 1, #full token: 26787, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 41.91, #queue-req: 0 [2026-04-03 15:32:39 TP0] Decode batch, #running-req: 1, #full token: 26827, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 41.15, #queue-req: 0 [2026-04-03 15:32:40] e[32mINFOe[0m: 100.109.56.33:59816 - "e[1mGET /metrics HTTP/1.1e[0m" e[32m200 OKe[0m [2026-04-03 15:32:40 TP0] Decode batch, #running-req: 1, #full token: 26867, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 42.71, #queue-req: 0 [2026-04-03 15:32:41 TP0] Decode batch, #running-req: 1, #full token: 26907, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 41.62, #queue-req: 0 [2026-04-03 15:32:42 TP0] Decode batch, #running-req: 1, #full token: 26947, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 41.73, #queue-req: 0 [2026-04-03 15:32:43 TP0] Decode batch, #running-req: 1, #full token: 26987, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 41.84, #queue-req: 0 [2026-04-03 15:32:44 TP0] Decode batch, #running-req: 1, #full token: 27027, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 40.28, #queue-req: 0 [2026-04-03 15:32:45 TP0] Decode batch, #running-req: 1, #full token: 27067, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 39.29, #queue-req: 0 [2026-04-03 15:32:46 TP0] Decode batch, #running-req: 1, #full token: 27107, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 35.12, #queue-req: 0 [2026-04-03 15:32:48 TP0] Decode batch, #running-req: 1, #full token: 27147, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 34.52, #queue-req: 0 [2026-04-03 15:32:49 TP0] Decode batch, #running-req: 1, #full token: 27187, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 34.70, #queue-req: 0 [2026-04-03 15:32:50] e[32mINFOe[0m: 100.109.56.33:38878 - "e[1mGET /metrics HTTP/1.1e[0m" e[32m200 OKe[0m [2026-04-03 15:32:50 TP0] Decode batch, #running-req: 1, #full token: 27227, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 35.32, #queue-req: 0 [2026-04-03 15:32:51 TP0] Decode batch, #running-req: 1, #full token: 27267, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 36.75, #queue-req: 0 [2026-04-03 15:32:52 TP0] Decode batch, #running-req: 1, #full token: 27307, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 37.04, #queue-req: 0 [2026-04-03 15:32:53 TP0] Decode batch, #running-req: 1, #full token: 27347, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 35.37, #queue-req: 0 [2026-04-03 15:32:54 TP0] Decode batch, #running-req: 1, #full token: 27387, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 36.12, #queue-req: 0 [2026-04-03 15:32:55 TP0] Decode batch, #running-req: 1, #full token: 27427, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 36.27, #queue-req: 0 [2026-04-03 15:32:56 TP0] Decode batch, #running-req: 1, #full token: 27467, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 38.29, #queue-req: 0 [2026-04-03 15:32:57 TP0] Decode batch, #running-req: 1, #full token: 27507, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 37.97, #queue-req: 0 [2026-04-03 15:32:59 TP0] Decode batch, #running-req: 1, #full token: 27547, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 36.89, #queue-req: 0 [2026-04-03 15:33:00 TP0] Decode batch, #running-req: 1, #full token: 27587, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 37.83, #queue-req: 0 [2026-04-03 15:33:00] e[32mINFOe[0m: 100.109.56.33:41102 - "e[1mGET /metrics HTTP/1.1e[0m" e[32m200 OKe[0m [2026-04-03 15:33:01 TP0] Decode batch, #running-req: 1, #full token: 27627, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 38.17, #queue-req: 0 [2026-04-03 15:33:02 TP0] Decode batch, #running-req: 1, #full token: 27667, full token usage: 0.00, mamba num: 2, mamba usage: 0.00, cuda graph: True, gen throughput (token/s): 37.50, #queue-req: 0 [2026-04-03 15:33:03 TP0] Decode batch, #running-req: 1, #full token: 27707, full token usage: 0.00, mamba num:

The RAM usage is basically configured as part of the recipe and is typically quite high (like 0.8) meaning that it’ll consume 80% RAM. Taking up that much RAM is independent of the size of the model. So it’s something you can configure.

For example, if you’re doing:

sparkrun run @sparkrun-transitional/qwen3-1.7b-vllm --tp 2

You could either download/modify recipe file or override at CLI:

sparkrun run @sparkrun-transitional/qwen3-1.7b-vllm --tp 2 --gpu-mem 0.3
(which would reduce usage to 30% instead of the recipe default)

The amount of RAM assigned is important to both fit the model and the KV cache associated with usage. So you’ll need to assign a lot if you have long context lengths and want to have a lot of simultaneous requests. If you don’t need as many simultaneous requests or long context length, then it should be safe to assign less memory.

Also, when you drop logs into the forum, it’s much easier to read them if you put it in markdown preformatted text blocks:

Either three backticks, NEWLINE, Paste logs, NEWLINE, three backticks;

or after paste, select the logs text and click the </> icon in the forum editor. It’ll make it much easier to read.

Still having corruption issues on clustering?