DGX Spark (SM121) Software Support is Severely Lacking - Official Roadmap Needed

We operate a 4-node DGX Spark cluster. After extensive testing, we must be direct: SM121 software support is fundamentally incomplete.

The Core Problem:
SM121 (GB10) lacks proper software ecosystem support. Even “fixed” issues are workarounds that disable Blackwell features.

Current Issues:

  1. PyTorch: No official CUDA 13.0 ARM64 wheels on PyPI (pytorch/pytorch#160162 unresolved). Custom index URL required.

  2. Triton: Issue #8335 “closed” by treating sm121 as sm80 - this disables Blackwell-specific features entirely.

  3. FlashInfer: Compilation failures on GB10 (flashinfer#2252 open, needs-triage). No official support.

  4. CUTLASS: FP8 kernels fail to dispatch on sm121.

  5. MoE Kernels: No optimized configs exist for NVIDIA_GB10 - runtime warning confirms this.

  6. vLLM: Official release requires workarounds (vllm#31128 open). NGC container stuck at v0.11.

  7. SGLang: Running on unofficial branch with temporary workarounds (sglang#11658 open since launch).

Questions:

  1. When will sm121 receive native support instead of sm80 fallbacks?
  2. What is the official roadmap for GB10 software parity with sm120 (RTX 50xx)?
  3. Who at NVIDIA owns DGX Spark software readiness?

We purchased enterprise hardware. We expect enterprise software support.

25 Likes

I’m working on vllm, flashinfer, and cutlass in another thread. I’d love some help to improve the open source?

4 Likes

I completely agree, right now there’s a community project called dgx_spark_config ( GitHub - GuigsEvt/dgx_spark_config: Complete end-to-end setup for maximizing DGX Spark compute for AI Workloads ) that compiles pytorch for DGX Spark with SM12.1 compatibility, but there’s an issue: their script doesn’t compile torch._utils that are required by transformers library(see issue #2 on dgx_spark_config), so I’d suggest to add these enviroment variables:

export USE_PYTHON=1
export BUILD_PYTHON=1
export BUILD_LIBTORCH_PYTHON=1
export BUILD_CAFFE2=0

before compiling their pytorch fork

2 Likes

I think what this thread is attempting to do is shine light on the fact that the community should not be on the hook for dumpster diving git commits and incomplete documentation in order to get marketed features on a platform that was suggested to be a test bed for production systems working.

If the idea was for these systems to then ‘scale’ to their larger brothers why is the architecture (sm121) completely different from the existing grace blackwell systems (sm100)

Asking for a roadmap and clear intentions for the longevity of a device like this is fair.

14 Likes

I’ve made a few similar posts in desperation. I have flat-out come to the conclusion that they are moving on from this product. Everything in the rest of the ecosystem is moving at a rapid pace, but support for this product falls behind what you would expect from a kickstarter operation, much less the multi-trillion valuation leader in AI hardware. It’s hard to tell if there is anyone at all working on addressing the issues you note above, or the myriad other issues noted. I will say, the community has been great.

At some point advertising this as a Blackwell-class system, and then never offering any actual Blackwell-class platform optimizations for it strikes me as very misleading.

3 Likes

Hello,

Thank you for the detailed feedback and for articulating your concerns so clearly. We understand the expectations that come with deploying NVIDIA systems such as DGX Spark, and we appreciate the opportunity to clarify both the current state and the roadmap.

Below we address your points and questions directly, with additional context.


Clarifications on the Reported Issues

1. PyTorch (CUDA 13.0, ARM64 wheels)

PyTorch wheels are distributed via a custom PyTorch index URL by design. This is not specific to SM121 or GB10. PyPI does not support publishing multiple CUDA variants of the same package, which affects all major frameworks (PyTorch, vLLM, SGLang, etc.).

This does not indicate a lack of compatibility. On the contrary, best practices strongly recommend using the official framework indexes to ensure you receive fully validated, CUDA-enabled builds.

Additional context:

  • CUDA kernels are compiled at the major architecture family level (sm12x), not per individual SKU.

  • Only certain Tensor Core–specific kernels require conditional compilation, which is already handled in the codebase.

  • PyTorch 2.10, scheduled for release on January 21, includes FBGEMM and CUTLASS matmul integrations, further improving performance on sm12x platforms.


2. Triton

Triton operates as an independent project, and NVIDIA actively collaborates and contributes upstream.

The issue you reference is a known pattern from prior architectures:

  • Triton is currently built against CUDA 12.8 internally.

  • On DGX Spark, users can either:

    • Compile Triton locally, or

    • Use the Triton build bundled with the official PyTorch wheels (recommended).

Relevant bugs have been addressed in Triton 3.6.0, which resolves the sm12x handling concerns.


3. FlashInfer

Support for sm12x was added starting in FlashInfer v0.5.2.
Notably, the wheels are now explicitly built targeting sm12x, ensuring compatibility with GB10-class devices.


4. CUTLASS

CUTLASS fully supports sm12x today.
Additional optimizations, including new MMA functions, are landing in the CUTLASS v4.4.x series to further enhance performance on Blackwell-class GPUs.


5. MoE Kernels

This is an area of active development. Optimized configurations for GB10 are being worked on and will be introduced incrementally in upcoming releases.


6. vLLM

NGC container versioning will be made more explicit in upcoming releases and on the Build & Spark documentation pages.

In the meantime:


7. SGLang

SGLang runs correctly on DGX Spark today using its official custom wheel distribution:


Responses to Your Direct Questions

1. When will SM121 receive native support instead of SM80 fallbacks?

sm80-class kernels can execute on DGX Spark because Tensor Core behavior is very similar, particularly for GEMM/MMAs (closer to the GeForce Ampere-style MMA model). DGX Spark not has tcgen05 like jetson Thor or GB200, due die space with RT Cores and DLSS algorithm

This is a compatibility feature, not a permanent fallback. Native sm12x-optimized kernels are being introduced progressively across libraries.

Example reference:
https://github.com/NVIDIA/cutlass/blob/main/examples/python/CuTeDSL/ampere/flash_attention_v2.py


2. What is the official roadmap for GB10 software parity with SM120 (RTX 50xx)?

GB10 already has software parity with RTX 50xx.
Both platforms belong to the same sm12x architecture family, and the software stack is aligned at that level.


3. Who at NVIDIA owns DGX Spark software readiness?

DGX Spark software readiness is a cross-functional responsibility.
Multiple NVIDIA teams—CUDA, frameworks, libraries, NGC, and systems—work together to deliver and validate the end-to-end experience.


Closing

We recognize that deploying DGX Spark at scale requires not only hardware capability but also a mature and transparent software ecosystem. Your feedback is valuable and is actively influencing prioritization across teams.

We remain committed to delivering enterprise-grade software support that matches the expectations of enterprise customers.

14 Likes

Thank you for your detailed response, johnny_nv. However, after verification, I found several
factual errors that need to be addressed.

Incorrect Version Numbers

1. PyTorch 2.11 (January 21)

Your claim: “PyTorch 2.11, scheduled for release on January 21”
Fact: January 21, 2026 is the release date for PyTorch 2.10, not 2.11.

Source: PyTorch Milestone #57

“M6: Release Day (1/21/26)” - for version 2.10.0


2. Triton 3.6.0

Your claim: “Relevant bugs have been addressed in Triton 3.6.0”
Fact: Triton 3.6.0 does not exist. The latest release is v3.5.1 (November 12, 2025).

Source: Triton GitHub Releases


3. FlashInfer v0.5.8

Your claim: “Support for sm12x was added starting in FlashInfer v0.5.8”
Fact: FlashInfer v0.5.8 does not exist.

  • Latest stable: v0.5.3 (November 20, 2025)
  • Latest pre-release: v0.6.0rc2

Source: FlashInfer GitHub Releases


4. CUTLASS v4.4.x

Your claim: “Additional optimizations landing in CUTLASS v4.4.x”
Fact: The latest CUTLASS version is v4.3.5 (January 9, 2026). v4.4.x does not exist yet.

Source: PyPI nvidia-cutlass-dsl


Misleading Claims

5. SGLang “runs correctly on DGX Spark today”

Fact: GitHub Issue #11658 tracking DGX Spark support is still OPEN with unresolved
problems:

From the issue description by @yvbbrjdr:

  • “The branch currently includes several temporary workarounds
  • “Outdated base… rebasing onto main may not succeed cleanly”
  • “Triton issue: PTXAS compilation error… remains unresolved”
  • “FP8 CUTLASS kernels currently fail to dispatch on GB10”
  • “All external dependencies… have been disabled due to unknown compatibility”

The lmsysorg/sglang:spark image uses an unofficial development branch with temporary
workarounds.

Source: SGLang Issue #11658


Summary

Claim Reality Status
PyTorch 2.11 (Jan 21) PyTorch 2.10 releases Jan 21 ❌ Wrong version
Triton 3.6.0 Latest is 3.5.1 ❌ Non-existent
FlashInfer v0.5.8 Latest stable is v0.5.3 ❌ Non-existent
CUTLASS v4.4.x Latest is v4.3.5 ❌ Non-existent
SGLang “runs correctly” Issue #11658 open with workarounds ❌ Misleading

Questions

  1. Why are non-existent version numbers being cited in an official NVIDIA response?
  2. Are these internal/unreleased versions? If so, when will they be publicly available?
  3. Can we get an accurate, verified status of DGX Spark software support?

We understand software ecosystems move fast, but accurate version information is critical for
enterprise deployment planning. Four non-existent version numbers and a misleading claim about
SGLang in a single response raises concerns about the accuracy of official communications.

We look forward to a corrected response with verified information.


8 Likes

Thank you for the detailed feedback. I’d like to clarify and correct a few points, as some of the issues referenced have already been addressed or are based on outdated information.

Corrections and current versions

Frameworks and wheels

  • SGLang and vLLM: Both projects release wheels at a faster cadence than their docker stable tags. In addition to official releases, nightly wheels are available for SGLang, vLLM, and PyTorch, allowing users to validate the latest fixes and improvements as they land. From our side, we are working to bring new containers.

  • CUTLASS: CUTLASS is compatible with DGX Spark starting from version 4.2.0, as documented in the official changelog: Release CUTLASS 4.2.0 · NVIDIA/cutlass · GitHub
    Last version is 4.3.5 Release CUTLASS 4.3.5 · NVIDIA/cutlass · GitHub and upcoming version is v4.4.x. v4.4.x comes better cute DSL.

Regarding linked issues
Several of the issues referenced have already been resolved. In at least one case, the problem originated from a custom build taken directly from the main branch, which can introduce temporary or unrelated regressions. @shahizat
Additionally, SGLang is compatible, I am showing here: Run SGLang in Spark, vLLM too Run VLLM in Spark and the referenced thread is primarily a discussion rather than an indication of lack of support. I am on there.

We recommend tracking official releases and nightly builds for the most experimented developers or testing.

1 Like

Thank you for the follow-up. After thorough verification:

Triton 3.6.0: The version exists only in PyTorch’s TEST index—not as an official
stable release. Recommending pre-release software to enterprise customers is concerning.
When will 3.6.0 be officially released?

FlashInfer 0.5.3: The changelog mentions SM121, but users still encounter ninja
compilation errors on GB10. Your own forum posts recommend “12.1f, NOT 12.0”
builds—that’s a workaround, not plug-and-play support.

CUTLASS: C++ API confirmed working. Python DSL still restricts FP4 to sm_100a only
(Issue #2800 open).

SGLang/vLLM: Both run with workarounds:

  • SGLang: via lmsysorg/sglang:spark (personal dev branch, not mainline)
  • vLLM: requires --enforce-eager (20-30% performance loss)
  • Your own forum post (Dec 2025): “CRITICAL FINDING: Native Solution Failed, Workaround
    Used”

The distinction matters: “Works with workarounds” ≠ “Supported.” Enterprise customers
need to know the difference upfront.

Side note: I see 8 people liked the original response. Genuinely curious—are you all running on different hardware, or am I in the wrong forum? Because if everyone’s happy with “compile from dev branch + apply manual patches + accept 20-30% perf loss” as the definition of “supported,” then maybe I missed the memo.

Or is everyone just… okay with this?


Happy to be corrected with specific commit hashes or release notes.

7 Likes

Triton 3.6.0
Triton releases are tied to PyTorch releases. Version 3.6.0 is currently delivered as part of PyTorch’s release pipeline and is available via the PyTorch test/nightly index because it is a release candidate, not a standalone Triton GA.
The official release is scheduled to coincide with the next PyTorch release, currently targeted for January 21. I referenced 3.6.0 specifically to point to where the SM121 fixes already exist not to suggest enterprises should standardize on unreleased software long-term.

FlashInfer
Have you tested against the latest FlashInfer (0.5.3)? The changelog explicitly calls out SM121 support. I could compile well since 0.5.0. from 0.5.3 flashinfer distribute all wheels for dgx spark.

Regarding CUDA builds: CUDA 12.0f is the correct baseline for Geforce Blackwell general support. CUDA 12.1a is only required if you need chip-family-specific features, as explained here:
https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/
Using a specific build variant to unlock optional features should not be interpreted as a workaround, it’s an architectural capability distinction.

CUTLASS
Majority of frameworks is still using C++ API. Anyways, CuTE DSL comes with new features in each version. You can see one example here: SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm by hholtmann · Pull Request #29711 · vllm-project/vllm · GitHub
Anyways, cute DSL is compatible with DGX Spark and you can use it, and more mma operations are coming.

SGLang
lmsysorg/sglang:spark is indeed a community member’s branch, not mainline. It exists to unblock users early launch.
That said, SGLang does run using official wheels and users are free to:

  • Build their own Docker images

  • Use official wheels

  • Use NGC containers
    Nothing requires that specific branch; it is one of several available options.

vLLM
The --enforce-eager flag is required in certain versions to maintain correctness.
Compatibility and performance improvements are actively landing. You should see updates in vLLM 0.14.0, expected shortly, which improves Blackwell compatibility and reduces reliance on eager execution.

On “works” vs. “supported”
I agree with you that the distinction matters. My intent was not to blur that line, but to enumerate what is currently possible and where fixes already exist, depending on risk tolerance and deployment maturity.
Different customers sit at different points on that spectrum, so I outlined the available paths rather than prescribing a single one.

I don’t know what type of developer or deployment model you’re running, so I deliberately showed all current options, from conservative to early-access, so you can choose what aligns with your operational requirements.

4 Likes

Hi @johnny_nv — thanks for the follow-up.

In the published DGX Spark specifications, the GPU is described as Blackwell with “5th Generation Tensor Cores” (and NVIDIA marketing highlights ~1 PFLOP FP4 capability). In your response, you also wrote:

“DGX Spark not has tcgen05 like jetson Thor or GB200, due die space with RT Cores and DLSS algorithm”

Can you clarify what this means precisely?

  1. When NVIDIA says “5th Generation Tensor Cores” for DGX Spark, does that correspond to tcgen05 (and related TMEM / 5th-gen TC instruction paths), or is it something different on SM121/GB10? If it’s different, what is the correct mapping between the spec wording and the actual instruction set/features available on SM121?

  2. Is FP4/NVFP4 training (not just theoretical TOPS) supported on DGX Spark today? If yes, what is the supported stack and workflow (CUDA/driver + cuBLASLt/CUTLASS + Transformer Engine versions, required build flags, etc.)?

  3. If FP4/NVFP4 training is not supported today on SM121, can you provide an official roadmap (target versions/dates) for when it will be? We purchased DGX Spark specifically to evaluate FP4 training, and the lack of a supported path is blocking our work.

  4. There are multiple ongoing community threads/issues around NVFP4/SM121 support (Transformers Engine and CUTLASS) that have not resulted in a working, officially-supported solution for DGX Spark. Can you please escalate this to the DGX Spark software readiness owner / product manager and provide a point of contact (or an official statement) so customers can plan?

An authoritative clarification here would help the whole community avoid guesswork and conflicting interpretations.

7 Likes

Thank you for the detailed response. I appreciate the transparency—it’s far more informative than the initial reply.

However, I need to escalate a broader concern that goes beyond technical minutiae.


The gap between marketing and reality

NVIDIA CEO Jensen Huang announced DGX Spark with these words:

“DGX-1 launched the era of AI supercomputers and unlocked the scaling laws that drive modern AI. With DGX Spark, we return to that mission — placing an AI computer in the hands of every developer to ignite the next wave of breakthroughs.”

“Placing an AI supercomputer on the desks of every data scientist, AI researcher and student empowers them to engage and shape the age of AI.”

NVIDIA’s official marketing promises:

  • “The world’s smallest AI supercomputer”

  • “Petaflop powerhouse built for creators, researchers, and developers”

  • “Data center-level AI capabilities to a desktop”

  • “Full NVIDIA AI software stack — frameworks, libraries, pretrained models”

  • “Seamlessly move models from desktop to cloud with virtually no code changes”

What we actually received:

  • SM80 fallback kernels because native SM121 paths are “being introduced progressively”

  • MoE kernels described as an “area of active development”

  • FlashInfer requiring manual compilation with unofficial arch flags

  • Triton fixes existing only in unreleased RC builds

  • vLLM requiring --enforce-eager with 20-30% performance penalty

  • SGLang working via community dev branches, not mainline

  • Forum documentation stating: “Native Solution Failed, Workaround Used”


The financial reality

This is not a hobbyist experiment. My investment:

  • 4× DGX Spark: $16,000 USD

  • Network switch for multi-node clustering: $1,000 USD

  • 2× DAC cables: $290 USD

  • Total: ~$17,300 USD

Customers purchased this hardware based on explicit promises from NVIDIA’s CEO and official marketing. We expected production-ready software to match the “supercomputer” branding—not a beta testing program where users debug kernel compilation failures on community forums.


The language shift tells the story

Your initial response used: “fixed,” “compatible,” “resolved,” “supported.”

This response uses: “being introduced progressively,” “area of active development,” “will be introduced incrementally,” “expected shortly.”

These carry fundamentally different implications. The first suggests production-ready status; the second accurately describes an evolving ecosystem with significant gaps.


My request

I understand you’re a technical contributor, and I genuinely appreciate your engagement on this forum. But this feedback needs to reach product management and executive leadership:

Customers who purchased DGX Spark based on NVIDIA’s marketing are beginning to feel they were sold a vision, not a product.

When a $4,000 device is marketed as an “AI supercomputer” with a “full software stack,” customers reasonably expect mature software support—not forum threads explaining why native solutions failed and workarounds are required.

I respectfully request this sentiment be escalated. The gap between what was promised and what was delivered is eroding trust among early adopters who made significant financial commitments based on NVIDIA’s public statements.

Thank you for your continued engagement.

14 Likes

@eugr Hi

I’ve noticed you’re one of the most dedicated contributors on this forum, consistently providing workarounds and solutions for DGX Spark issues. Your efforts are genuinely appreciated by the community.

However, I’m curious why this particular thread hasn’t received your input. This discussion addresses a fundamental concern: the gap between how DGX Spark was marketed (“production-ready AI supercomputer,” “full software stack,” “seamless deployment”) and the actual state of framework support that requires community-driven workarounds.

I want to raise a philosophical question that’s been on my mind:

Do workarounds help or hurt in the long run?

Every time a community member finds a workaround for FlashInfer compilation, Triton PTX errors, or vLLM compatibility issues, it:

  • ✅ Helps individual users get unstuck (valuable!)

  • ❌ Reduces pressure on NVIDIA to deliver proper native support

  • ❌ Creates a false impression that issues are “resolved”

  • ❌ Shifts engineering burden from a trillion-dollar company to unpaid volunteers

With Rubin arriving in H2 2026 and NVIDIA’s aggressive annual cadence, my concern is that SM121/GB10 will never achieve true software maturity before resources shift to the next architecture.

You’ve spent countless hours helping this community. What’s your honest assessment:

  1. Do you believe DGX Spark will ever reach the “production-ready” state it was marketed as?

  2. Is there a more important topic you’re focused on that I’m missing?

  3. Do you think community workarounds inadvertently let NVIDIA off the hook?

Your silence on this thread is notable precisely because your voice carries weight here. I’d genuinely value your perspective.

Respectfully,

3 Likes

As clear as it is you are copying and pasting Johnny’s responses into your selected Ai chat (and vice versa with the output), what you are trying to accomplish is appreciated. However, I myself (who is similarly invested $$ wise), understands marketing lingo especially in the computing space.

All those points you highlighted from Jenson always come with asterisks.

Also this is a dev device at the end of the day. It will constantly evolve.

Your LLM will always have a bias/be on your side btw - so it doesn’t matter what Johnny responds with. The next output will be something you feel should be placed here.

I would caution you to articulate things from your own practical use and understanding.

The ONE thing I DO want more for the DGX, which is something the other dev products like the Thor and Jetson have - transparency on WHEN the big kernel updates are coming.

An ETA at least.

1 Like

@Balaxxe

So let me understand your argument here:

You noticed I’m using AI to help structure my research. Okay, and?

Does that mean:

  • The GitHub issues I cited don’t exist?

  • FlashInfer’s release notes magically include SM121 now?

  • Triton 3.6.0 suddenly appeared on PyPI?

  • vLLM Issue #31128 is a hallucination?

  • The forum posts saying “Native Solution Failed, Workaround Used” were AI-generated fiction?

Every single claim I made has a source link. Click them. Verify them yourself. That’s the beauty of citations - they don’t care who compiled them.

You say “LLM will always be on my side” - but an LLM can’t fabricate a GitHub issue number that actually exists when you click it. It can’t invent forum threads with real timestamps. It can’t manufacture NVIDIA’s own release notes.

Your entire response contains zero technical counterarguments. Not one. You didn’t say:

  • “Actually, FlashInfer 0.5.3 does support SM121, here’s the proof”

  • “Triton 3.6.0 is available here: [link]”

  • “vLLM works natively without --enforce-eager, I tested it”

Instead, you said “marketing lingo comes with asterisks” and “it’s a dev device.”

You know what’s interesting? You ended by saying you want “transparency on WHEN the big kernel updates are coming. An ETA at least.”

That’s literally my point. Thank you for validating it.

The tool I use to articulate my argument is irrelevant. The argument stands or falls on evidence. Attack the evidence, not the messenger’s methodology.

3 Likes

No, I’m simply highlighting for you the output will always be worded in a way that convinces you "yea, I should follow-up with this as it’s valid”. You came in with a bias because you are upset, you will maintain that bias until all items are resolved. You are communicating behind a tool creating an echo chamber.

The fixes won’t be today and they won’t be tomorrow.

You made your point two post’s ago.

The thing about people, specifically in this day and age, is that when it’s clear an LLM is running a thread, the thread dies.

So chill on that.

I would like the dev’s to continue to respond as well as other community members (like eugr).

BUT…it’s not incentivizing to do that when there are multiple LLMs controlling the narrative with a singular bias of “I’m upset the issues aren’t resolved today right now”.

We all have been asking for transparency via multiple mediums.

You will NOT be able to fit enough context in the window you are using with your LLM to get a true idea of the actual pervasive issues that need to be addressed are.

Users like Eugr, who have actually contributed and dug VERY deep into all of this, will.

This will be my last post in this thread as I do not wish it derailed. I’m not going to waste more of my time and battle an LLM today.

Best of luck.

5 Likes

First of all, let me address me not participating in this thread.

As much as I like helping others on this forum, I don’t always have time for it. With holiday season in the past, business is starting to pick up again, so I don’t have much time to spend on this right now. So I may miss a thread or two or be selective to which ones I reply.

As for your other questions… It’s already been discussed in countless threads, but basically the reality is that NVIDIA’s cash cow is not consumer or even prosumer devices like DGX Spark, it’s their datacenter offerings, so their focus will stay there.

Also, PyTorch, vLLM, Flashinfer, Triton, etc - are community projects. Now, I wish NVIDIA invested more time in helping to support and optimize for consumer-grade hardware, but they were never “on the hook”. I can’t speak for others, but I wasn’t going to sit and wait until NVIDIA rolls out proper support. I could sell my Sparks and get another setup, like dual RTX6000 for twice as much money, which would be a better price/performance ratio, but won’t get me low power devices that I can keep running 24/7 at home. Or I could put some extra effort and get it working for my use cases. I chose the latter, but it was a good excuse to dive deeper into inference engines and what not, so it was a valuable learning experience for me.

The bottom line is that NVIDIA marketing is misleading in many aspects, like positioning this as a “baby Grace Blackwell”, while it’s a completely separate lineage that is much closer to RTX5090/6000. And lots of other things that you’ve mentioned. In reality, it’s still the best device of its class that you can buy today as long as you understand its limitations.

I’d say, it’s much, much worse for consumer AMD devices, just look at Strix Halo (I have that one too).

As for production-ready state… I guess, depends on a definition of “production”. For me, it is in this state already, as I can run my workflows on it with acceptable performance, however the full potential is still not realized, but it’s true for other consumer Blackwell chips too (and in some cases, even datacenter ones).

15 Likes

BTW, it does. Even without my Docker build, the prebuilt vLLM cu130 wheels work just fine now.

3 Likes

What if Nvidia has several large software optimizations planned in 2026 for DGX Spark and they are less supportive currently to see what the community develops?

That gives time for the public to purchase one or two boxes due to poor sentiment and lack of official support for more than two connected boxes, and helps Nvidia locate good developers.

Will there even be a Rubin version if the spark?

Hello,

  1. All GPUs feature 5th-generation Tensor Cores with designs optimized per market segment. SM120 GPUs (5090, 6000, Spark, etc.) use a Tensor Core implementation tuned for their target workloads, enabling strong compatibility with Ampere/Ada kernels and high TFLOPS with minimal changes, while Blackwell Tensor Cores with Tensor Memory are designed for platforms that benefit from larger on-core memory. cutlass/python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/mma.py at 8debf77437753beca676eb3c6bf97b56a5f9fd68 · NVIDIA/cutlass · GitHub

    1. We distribute for all devices.
    2. It is supported, but understand that AI is a big ecosystem that we are working across all frameworks with all people, and the things can be delayed. Did you try latest transformer engine 2.11? If you still have problems, i am going to investigate it internally.
    3. A lot of teams read in forums.
5 Likes