DGX Spark Multi-Node LLM Inference Report for Qwen3-235B model

DGX Spark Multi-Node LLM Inference Report

Date: December 17, 2025
System: 2x NVIDIA DGX Spark (GB10 GPU - Blackwell SM121)
Goal: Run Qwen3-235B model with multi-node distributed inference


CRITICAL FINDING: Native Solution Failed, Workaround Used

NVIDIA’s native multi-node inference stack designed for DGX Spark is NOT ready for GB10/SM121. The working solution in this report was achieved through a workaround, not the intended native path.

Native vs Workaround Comparison

Method Status Issue Performance Impact
vLLM + Ray (native tensor parallelism) FAILED GB10 not recognized as GPU resource -
TensorRT-LLM + NVFP4 (native NVIDIA stack) FAILED SM121 GEMM kernels missing -
llama.cpp + RPC (workaround) WORKING Uses TCP/IP, not NCCL ~1-2μs extra latency

Expected Native Flow (Did NOT Work):

vLLM → Ray → NCCL → NVLink/ConnectX-7 → Native Tensor Parallelism
                    (39.34 GB/s)

Workaround Flow Used:

llama.cpp → RPC → TCP/IP → Manual Layer Splitting
                  (added latency)

Performance Implications

  • NCCL test: 39.34 GB/s throughput (hardware is working)
  • RPC backend: Running over TCP/IP, NOT using NCCL
  • Potential loss: Native solution could have been estimated 2-3x faster
  • Current performance: 12.5 t/s (better than NVIDIA’s 11.73 t/s benchmark, but below native potential)

Note to NVIDIA: vLLM/Ray integration for GB10 GPUs and SM121 NVFP4 kernel support should be critical priorities. The hardware can deliver 39 GB/s NCCL throughput, but the software stack cannot utilize it.


1. System Specifications

Hardware

Node IP (QSFP) GPU GPU Memory CPU RAM
dgxnode1 169.254.1.1 NVIDIA GB10 128GB UMA (~117GB usable) 20 core 119GB
dgxnode2 169.254.1.2 NVIDIA GB10 128GB UMA (~117GB usable) 20 core 119GB

Network

  • Connection: QSFP 200GbE direct cable
  • MTU: 9000 (Jumbo frames)
  • Subnet: 169.254.0.0/16 (link-local)

Software Environment

  • OS: Ubuntu 24.04 (Linux 6.14.0-1015-nvidia)
  • CUDA: 13.0+ (SM121 Blackwell support)
  • Driver: NVIDIA Container Toolkit

2. SUCCESSFUL OPERATIONS

2.1 NCCL Multi-Node Communication

Status: SUCCESS

NCCL version: 2.28.9-1
Test: nccl_message_transfer (all_reduce)
Performance: 39.34 GB/s throughput

Steps Taken:

  1. Configured MTU 9000 (jumbo frames)
  2. Set up Docker network configuration
  3. Configured NCCL environment variables
  4. Verified with nccl-tests

2.2 llama.cpp Build (CUDA + RPC Support)

Status: SUCCESS

Build Commands:

git clone --depth 1 https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON -DGGML_RPC=ON -DLLAMA_CURL=OFF
cmake --build build --config Release -j$(nproc)

Result:

  • llama-server, llama-cli, rpc-server binaries created
  • Compiled with SM121 (Blackwell) support
  • Copied to both nodes (rsync)

2.3 spark-vllm-docker Build

Status: SUCCESS

Duration: 50 minutes 30 seconds
Image: vllm-spark:latest (23.4GB)

Features:

  • Based on vLLM v0.12.0
  • Compiled with SM121 CUDA kernels
  • NVFP4 and AWQ quantization support
  • Optimized with Triton compiler

2.4 Model Download (Qwen3-235B Q4_K_XL)

Status: SUCCESS

Model: unsloth/Qwen3-235B-A22B-GGUF (UD-Q4_K_XL quantization)
Size: 134GB (3 split files)
Duration: ~20 minutes (with hf_transfer)

Files:

/home/user/models/UD-Q4_K_XL/
├── Qwen3-235B-A22B-UD-Q4_K_XL-00001-of-00003.gguf (47GB)
├── Qwen3-235B-A22B-UD-Q4_K_XL-00002-of-00003.gguf (47GB)
└── Qwen3-235B-A22B-UD-Q4_K_XL-00003-of-00003.gguf (33GB)

Download Optimizations:

  • Enabled hf_transfer library
  • HuggingFace token authentication
  • HF_HUB_ENABLE_HF_TRANSFER=1 environment variable

2.5 Single GPU Test (dgxnode1)

Status: SUCCESS

Performance: ~1.8 tokens/sec

Command:

./llama-server -m "$MODEL" -ngl 999 --host 0.0.0.0 --port 8082 -c 2048

Results:

  • 95/95 layers loaded to GPU
  • CUDA0 buffer: 115GB
  • Model running entirely in GPU memory

2.6 Multi-Node RPC Test (2x DGX)

Status: SUCCESS

Performance: ~12.5 tokens/sec (7x speedup!)

Command:

./llama-server \
  -m "$MODEL" \
  --rpc "169.254.1.2:50052" \
  -ngl 999 \
  -fit off \
  --host 0.0.0.0 \
  --port 8082 \
  -c 2048

Memory Distribution:

  • CUDA0 (dgxnode1): 63GB
  • RPC0 (dgxnode2): 64.5GB
  • CPU Mapped: 334MB

API Test:

{
  "prompt_per_second": 37.73,
  "predicted_per_second": 12.50,
  "total_tokens": 164
}

3. FAILED / PROBLEMATIC OPERATIONS

3.1 vLLM Ray Distributed Backend

Status: FAILED
Error: “Current node has no GPU available”

Root Cause:
Ray cluster registers GPUs as accelerator_type:GB10, but vLLM v1 engine expects the GPU resource key. This is a resource mapping issue.

Details:

Ray Node Resources:
- CPU: 20.0
- memory: 68GB
- accelerator_type:GB10: 1.0
- GPU: (MISSING!)  <-- This is the problem

Attempted Solutions:

  1. VLLM_USE_V1=0 (legacy engine) - Did not work
  2. Ray cluster restart - Did not work
  3. Placement group cleanup - Did not work

Potential Fixes:

  • Force GPU detection via CUDA_VISIBLE_DEVICES
  • Patch vLLM’s Ray resource detection code
  • Test if spark-vllm-docker image resolves this issue

Recommendation for NVIDIA/vLLM Team:
The GB10 GPU is not being recognized as a standard GPU resource in Ray. vLLM’s worker initialization fails because it looks for GPU resource type, but Ray only registers accelerator_type:GB10. This needs either:

  1. Ray to also register a generic GPU resource for GB10
  2. vLLM to recognize accelerator_type:GB10 as a valid GPU resource

3.2 NVFP4 Quantization

Status: FAILED
Error: “Failed to initialize GEMM Plugin”

Root Cause:
NVFP4 kernels are written for SM90 (Hopper) and do not support SM121 (Blackwell).

Details:

NVFP4 FP8 GEMM kernel not found
No compatible kernel for SM121

Recommendation for NVIDIA Team:
SM121 (Blackwell/GB10) needs NVFP4 GEMM kernel support. Currently only SM90 (Hopper) kernels are available in TensorRT-LLM and vLLM.

Workaround Used: AWQ quantization (32% faster than NVFP4 on DGX Spark anyway)


3.3 llama.cpp RPC Segmentation Fault

Status: RESOLVED (with workaround)

Error: Segfault during “fitting params to device memory”

Root Cause:
llama.cpp’s automatic memory fitting algorithm is incompatible with the RPC backend.

Solution:
Added -fit off parameter to disable automatic fitting.

# Does NOT work:
./llama-server -m $MODEL --rpc "..." -ngl 999

# WORKS:
./llama-server -m $MODEL --rpc "..." -ngl 999 -fit off

Recommendation for llama.cpp Team:
The automatic memory fitting feature crashes when RPC backend is enabled. Consider adding RPC-aware memory fitting or documenting this limitation.


3.4 dgxnode2 Docker GPU Access

Status: RESOLVED

Error: “Failed to initialize NVML: Unknown Error”

Root Cause:
/etc/docker/daemon.json file was missing on dgxnode2 (NVIDIA Container Toolkit not configured).

Solution:

# Copied from dgxnode1 to dgxnode2
scp /etc/docker/daemon.json dgxnode2:/etc/docker/
ssh dgxnode2 "systemctl daemon-reload && systemctl restart docker"

daemon.json contents:

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

Note: This may be a DGX Spark setup issue - the second node did not have Docker properly configured for GPU access out of the box.


3.5 aria2c Download Issue

Status: RESOLVED

Problem:
When starting download with aria2c, there was incompatibility with previous hf download data. Files appeared as sparse.

Solution:
Used hf download with hf_transfer instead of aria2c.


4. PERFORMANCE COMPARISON

Configuration Tokens/sec Notes
CPU Only (single node) ~0.1 Too slow, not practical
Single GPU (dgxnode1) ~1.8 Entire model 115GB in GPU
Multi-node (2x DGX RPC) ~12.5 7x speedup

Prompt Processing: 37.7 tokens/sec
Token Generation: 12.5 tokens/sec


5. CURRENT WORKING STATE

Running Services

Service Host Port Status
llama-server dgxnode1 8082 RUNNING
rpc-server dgxnode1 50052 RUNNING
rpc-server dgxnode2 50052 RUNNING

API Usage

curl http://dgxnode1:8082/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d @request.json

Example Request

{
  "model": "qwen3-235b",
  "messages": [{"role": "user", "content": "Hello!"}],
  "max_tokens": 100
}

6. NOTES FOR FUTURE UPDATES

Items to Check on Driver/Software Updates

  1. vLLM Ray GPU Resource Issue

    • vLLM version: 0.12.0
    • Ray version: 2.x
    • Issue: accelerator_type:GB10 vs GPU resource mapping
    • May be fixed in newer versions
  2. NVFP4 SM121 Support

    • TensorRT-LLM and vLLM need SM121 NVFP4 kernel support
    • May come with CUDA 13.x updates
  3. llama.cpp RPC Stability

    • -fit off workaround required
    • May be fixed in newer versions

Recommended Configuration (Production)

# Start RPC Servers (on each node)
cd ~/llama.cpp/build/bin
export LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH
./rpc-server -H <NODE_IP> -p 50052

# Start llama-server (on master node)
./llama-server \
  -m /home/user/models/UD-Q4_K_XL/Qwen3-235B-A22B-UD-Q4_K_XL-00001-of-00003.gguf \
  --rpc "169.254.1.2:50052" \
  -ngl 999 \
  -fit off \
  --host 0.0.0.0 \
  --port 8082 \
  -c 4096

7. FILE LOCATIONS

File/Directory Location
llama.cpp build /home/user/llama.cpp/build/bin/
llama.cpp (dgxnode2) /home/user/llama.cpp-build/bin/
Qwen3-235B model /home/user/models/UD-Q4_K_XL/
spark-vllm-docker /home/user/spark-vllm-docker/
vLLM image vllm-spark:latest
Ray cluster script /home/user/run_cluster.sh

8. CONCLUSION

Successfully ran Qwen3-235B (235 billion parameters) model on 2x DGX Spark (GB10 Blackwell GPU) system.

Key Achievements:

  • Multi-node distributed inference with llama.cpp RPC backend
  • 12.5 tokens/sec performance (7x speedup compared to single GPU)
  • 127GB model memory distributed across two GPUs

Outstanding Issues:

  • vLLM Ray backend issue (GPU resource mapping) - Needs fix from vLLM/Ray team
  • NVFP4 quantization support (SM121 kernel missing) - Needs fix from NVIDIA team

Recommendation:
For production environments, the llama.cpp RPC solution is stable and performant. For vLLM solution, newer versions need to be monitored for GB10/SM121 compatibility fixes.


9. SUMMARY FOR NVIDIA TEAM

Critical Issues Requiring Attention:

  1. GB10 GPU Not Recognized as “GPU” Resource in Ray

    • Impact: vLLM multi-node inference completely broken
    • Workaround: None (had to use llama.cpp instead)
    • Suggested Fix: Ensure Ray registers GB10 as both accelerator_type:GB10 AND generic GPU resource
  2. Missing NVFP4 Kernels for SM121

    • Impact: Cannot use NVFP4 quantization on DGX Spark
    • Workaround: Use AWQ quantization
    • Suggested Fix: Add SM121 GEMM kernels to TensorRT-LLM/vLLM
  3. Docker daemon.json Missing on Second DGX Spark Node

    • Impact: GPU not accessible in Docker containers
    • Workaround: Manually copy config from first node
    • Suggested Fix: Ensure NVIDIA Container Toolkit is properly configured on all nodes during DGX setup

10. TRACKING LINKS FOR UPDATES

The following GitHub issues and forum threads should be monitored for fixes to the problems encountered in this report.

Critical Priority - Must Watch

Link Description Status
vLLM #30163 NVFP4 on 2x DGX Spark - Exact same scenario OPEN
vLLM #12614 “Current node has no GPU available” - Main issue OPEN
llama.cpp #13083 Tensor Parallelism over RPC - Would give 2-3x speedup OPEN

vLLM + Ray GPU Detection Issues

Link Description
vLLM #13093 Ray distributed “no GPU” error
vLLM #14109 Fractional GPU resource name issue
Ray #59064 Ray Serve + vLLM v1 placement group conflict

NVIDIA Forum - DGX Spark / GB10 Issues

Link Description
Two Sparks Does Not Work Multi-node vLLM issue
NIM Containers Fail on SM121 Triton/vLLM SM121 crash
vLLM Container Issue Container problems
vLLM Forums - DGX Spark vLLM forum discussion

TensorRT-LLM / NVFP4 SM121 Support

Link Description
TensorRT-LLM Releases Watch for SM121 kernel support
TensorRT-LLM #3591 Blackwell + FP4 issue
TensorRT-LLM #5018 RTX 5090 NVFP4 support
Support Matrix Official supported GPUs

llama.cpp RPC Improvements

Link Description
llama.cpp #9086 Tensor Parallelism support
llama.cpp #15463 RPC dual-node 50% GPU utilization bug
DGX Spark Discussion DGX Spark benchmarks

Release Pages (Check Weekly)

Link Description
llama.cpp Releases New versions
vLLM Releases vLLM updates
TensorRT-LLM Releases TRT-LLM updates
DGX Spark Forum Main DGX Spark forum

Report Date: December 17, 2025
System: dgxnode1 + dgxnode2 (2x DGX Spark)
Author: Testing multi-node LLM inference on DGX Spark cluster

3 Likes

appreciate this detailed writeup. we will attempt local repro and communicate with eng as needed. thank you

1 Like

This is a very long LLM-produced report, and it’s hard to read through it, but dual spark inference works just fine with vLLM and SGLang if you do it properly.

I’m not sure if spark-vllm-docker build was referencing my repository below, but it has no issues running models in a distributed fashion. I suggest you use an AWQ quant for now (e.g. QuantTrio/Qwen3-VL-235B-A22B-Instruct-AWQ) as NVFP4 is still not optimized on Spark.

2 Likes

Your GPU issue with vLLM is probably related to how you launch Docker - make sure you use --gpus=all. You need to also ensure you use host networking and run container as privileged or pass infiniband device to it. The easiest way is to just follow instructions in my repository - just use your actual network interface names and IP addresses.

Hi baristankut,

We’re in the progress of communicating all of this with engineering; in this effort, I’d like to know the exact commands from beginning to end that you used to produce the Ray/vLLM and TensorRT-LLM quantization errors, and the exact error messages. Thank you for your contributions thus far!

Yes, we used your spark-vllm-docker repo and successfully built it (took 50 minutes). Thanks for the AWQ recommendation - we got ~16.4 t/s with Qwen2.5-32B-AWQ on single node.

However, our issue was not with Docker or the image. Ray cluster was properly set up, both nodes showed 2 GPUs and 218GB memory. The issue was:

Ray Node Resources:

  • CPU: 20.0
  • memory: 68GB
  • accelerator_type:GB10: 1.0
  • GPU: (MISSING!) ← Problem here

Ray registered the GPU as accelerator_type:GB10 but vLLM v1 engine expects a GPU resource key. This is a resource mapping issue between Ray and vLLM.

We did use --gpus all, --network host, --ipc=host, and --privileged. Here’s our exact Docker command:

docker run -d \
–name vllm-head \
–gpus all \
–network host \
–ipc=host \
–ulimit memlock=-1 \
-e NCCL_SOCKET_IFNAME=enp1s0f1np1 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
nvcr.io/nvidia/vllm:25.11-py3 \
bash -c “ray start --head --port=6379 --node-ip-address=169.254.254.1 && sleep infinity”

GPU was visible inside the container (nvidia-smi worked fine). The problem was specifically Ray’s resource registration for GB10 - it shows accelerator_type:GB10 but not the generic GPU resource that vLLM expects.

We also tried NVIDIA’s official run_cluster.sh script with same result.

vLLM/Ray Exact Commands:

  1. Head Node (dgxnode1):
    export VLLM_IMAGE=nvcr.io/nvidia/vllm:25.11-py3
    export MN_IF_NAME=enp1s0f1np1
    export VLLM_HOST_IP=169.254.254.1

bash run_cluster.sh $VLLM_IMAGE $VLLM_HOST_IP --head ~/.cache/huggingface \
-e VLLM_HOST_IP=$VLLM_HOST_IP \
-e UCX_NET_DEVICES=$MN_IF_NAME \
-e NCCL_SOCKET_IFNAME=$MN_IF_NAME \
-e GLOO_SOCKET_IFNAME=$MN_IF_NAME \
-e MASTER_ADDR=$VLLM_HOST_IP

  1. Worker Node (dgxnode2):
    export MN_IF_NAME=enp1s0f1np1
    export VLLM_HOST_IP=169.254.106.137
    export HEAD_NODE_IP=169.254.254.1

bash run_cluster.sh $VLLM_IMAGE $HEAD_NODE_IP --worker ~/.cache/huggingface \
-e VLLM_HOST_IP=$VLLM_HOST_IP \
-e UCX_NET_DEVICES=$MN_IF_NAME \
-e NCCL_SOCKET_IFNAME=$MN_IF_NAME \
-e GLOO_SOCKET_IFNAME=$MN_IF_NAME \
-e MASTER_ADDR=$HEAD_NODE_IP

  1. vLLM Serve Command:
    docker exec -it $VLLM_CONTAINER vllm serve \
    QuantTrio/Qwen3-VL-235B-A22B-Instruct-AWQ \
    –tensor-parallel-size 2 \
    –distributed-executor-backend ray \
    –gpu-memory-utilization 0.7 \
    –max-model-len 32768

Ray Status Output (Successful):
Nodes: 2
Resources:
CPU: 40.0/40.0
memory: 218.88 GiB
accelerator_type:GB10: 2.0
object_store_memory: 19.45 GiB

vLLM Error:
ValueError: Current node has no GPU available.
current_node_resource={‘node:169.254.254.1’: 1.0, ‘CPU’: 20.0, ‘memory’: …}

Root Cause: Ray registers GB10 as accelerator_type:GB10 but vLLM v1 expects GPU resource key.


TensorRT-LLM/NVFP4 Commands:

docker run -d \
–name trtllm-multinode \
–gpus all \
–network host \
–ipc=host \
-e UCX_NET_DEVICES=“enp1s0f1np1” \
-e NCCL_SOCKET_IFNAME=“enp1s0f1np1” \
-v ~/models:/models \
nvcr.io/nvidia/tritonserver:25.04-trtllm-python-py3 \
sleep infinity

Inside container:

trtllm-serve nvidia/Qwen3-235B-A22B-FP4 --tp_size 2 --backend pytorch

NVFP4 Error:
Failed to initialize GEMM Plugin
NVFP4 FP8 GEMM kernel not found for SM121

Root Cause: NVFP4 GEMM kernels are compiled for SM90 (Hopper), not SM121 (Blackwell/GB10).

You said you used my Docker, but you are running NVIDIA one instead…

Can you try to follow instructions in my repo? I’ve even added a script that launches the cluster (and vllm command if needed) with interface autodetection, etc.

I don’t know what is causing your issue, but even if it worked, you’d be missing on performance, because you haven’t set IB devices, only management interface (ethernet).

Here is what it looks like when I run it on my system:

eugr@spark:~$ spark-vllm/launch-cluster.sh -t vllm-node-20251218 exec bash
Auto-detecting interfaces...
  Detected IB_IF: rocep1s0f1,roceP2p1s0f1
  Detected ETH_IF: enp1s0f1np1
Auto-detecting nodes...
  Detected Local IP: 192.168.177.11 (192.168.177.11/24)
  Scanning for SSH peers on 192.168.177.11/24...
  Found peer: 192.168.177.12
  Cluster Nodes: 192.168.177.11,192.168.177.12
Head Node: 192.168.177.11
Worker Nodes: 192.168.177.12
Container Name: vllm_node
Action: exec
Checking SSH connectivity to worker nodes...
  SSH to 192.168.177.12: OK
Starting Head Node on 192.168.177.11...
d844ae5bd989d81763be55d72ef715b187c534b91690a5909eac7825f3447998
Starting Worker Node on 192.168.177.12...
0f43d1941bbce6938377dd522b462d21f8e6d5af9363fd826fb8cc41a0cee7a3
Waiting for cluster to be ready...
Cluster head is responsive.
Executing command on head node: bash
root@spark:/workspace/vllm# ray status
======== Autoscaler status: 2025-12-19 06:13:35.616463 ========
Node status
---------------------------------------------------------------
Active:
 1 node_5266c0f87bdcf414fafb9690e2001f2030eea41d579758f645b865c4
 1 node_6695a1021780997d87830b40f5a1c92e6791f91d43c29ea0aa4ec97f
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Resources
---------------------------------------------------------------
Total Usage:
 0.0/40.0 CPU
 0.0/2.0 GPU
 0B/166.83GiB memory
 0B/71.50GiB object_store_memory

From request_resources:
 (none)
Pending Demands:
 (no resource demands

Thank you for these detailed steps! Will continue to communicate with engineering.

@baristankut I’m puzzled by your nccl speed. 39.34 GB/s x 8 bits/byte = 314.72 Gb/s. On a 200 GB/s C7X connection? And not counting overhead bytes?

How did you measure the 39.34?

A number of us have successfully used vLLM using @eugr ‘s repository (and also @mark440 ‘s, which shares some of the code).

That is reporting the bidirectional total bandwidth.

Ah. Do you know how to obtain that measurement?

I am not sure if you are asking about vLLM measurement or just the NCCL measurement. I wrote my own benchmark to play with. It’s rudimentary and probably better to enter in IP address yourself instead of scanning, but it basically says “If I can passwordless SSH into you, I have permissions” and then it will copy the AppImage into a nccl_benchmark directory on the other node. This is supposed to work on many nodes, but my code might be buggy and it might only work on 2 nodes.

like with anything, you can pick however you want to do the math. I posted an older version in the thread about GPUDirect being missing (you don’t really need it – it’s fast enough if you take advantage of unified RAM and are writing the stuff yourself; it is slower if you don’t write the code yourself).

It’s nice to see the difference between 10GbE and 200GB ConnectX-7. The DGX Spark is “expensive” if you don’t consider the value of the 200G interface and just use it for single box LLM inference.

NCCLBenchmark-aarch64.zip (35.5 MB)

1 Like

Having a single spark is basically wasting $1500-$1700 infiniband module. Roughly 40% of the value.

1 Like

Thanks for the detailed reply. I was thinking about the nccl piece. For my purposes I simply use Nvidia’s gather_perf tool, which gives about 22 +/- GB/s. But I am a user rather than a producer of exotic programs.

My understanding is that gather_perf doesn’t count bidirectional, so if you were measuring point to point and it was bidirectional, it could be “reported” as 44GB/sec.

The main difference with my benchmark is that my coordinates directly via TCP instead of MPI, which might reduce overhead and since my original goal was to understand the impact of no GPUDirect, I also do various CPU to GPU and GPU1 to GPU2 type measurements in additional to the raw NCCL measurements to make sure nothing was falling back to a 10GbE connection.

Have you read about Apple introducing RDMA into MacOS and how people are starting to use it to spread LLM work among several Macs at Thunderbolt 5 speeds? Any thoughts?

In NCCL perf tool it is reported as “alg bandwidth”. It’s literally just amount of data processed divided by time. Bus bandwidth tries to measure physical connection speed (unidirectional).

1 Like

Talking to a worker at microcenter last night about how annoyingly hard it is to find the qsfp cables and he told me that he wishes nvidia just bundled the cable because no one ever just buys one spark.

1 Like