Hardware refresh?

I am considering a GB10 server and have read some posts on this forum to help my buying decision. I have to applaud some of the super-human accomplishments here to boost tokens/second which I gather is directly caused by memory bandwidth limitations. It seems Nvidia has made some software improvements but maybe that has hit a wall.

Here is my naive question: Is there any chance a hardware refresh of the GB10 line could fix the memory bandwidth issue? And if so, what is best guess for when a hardware refresh might occur?

First, you must ask yourself: What is my goal in AI? Do you want to focus on inference, or are you looking to perform other operations like quantization and fine-tuning? I own a GB10, but for many AI tasks, a simple RTX 5090 is far superior. Regarding the possibility of expanding memory bandwidth, it depends entirely on the memory, and since it is soldered, it cannot be modified.

You might not find clear information on this specific point: The Spark has two unique characteristics in the NVIDIA ecosystem. It is not a full Blackwell BS120 architecture; it is Grace Blackwell BS121. Having a higher version number does not imply forward compatibility; on the contrary, it requires extensive adaptation work from developers (drivers and libraries), often forcing a fallback to Ampere (BS80) architectures. These older architectures lack TenG5 and hardware concurrency, which leads to exactly what you’d expect: issues with NVFP4 and critical deficiencies in Continuous Batching.

Furthermore, the memory is a bottleneck: it isn’t LPDDR7X or LPDDR6X, but LPDDR5X—technology from two generations ago. As of May 2026, a legitimate question arises: can small models like Qwen 3.6 27B and upcoming LLMs be inferred faster on a ‘consumer’ RTX 5090 than on a professional RTX 6000 Blackwell? The answer is likely yes, because models that fit entirely within VRAM—using TurboQuant for context compression—can run significantly faster on the RTX 5090’s 512-bit bus at 1.75 TB/s compared to the RTX 6000 Blackwell Pro’s 1.25 TB/s. ( BX10 Spark at 0.270 TB/s)

Ask yourself: What do you want to achieve with AI? Then, ask Claude Opus 4.7 what you should study and which hardware you should buy to make it happen.

Not any time soon, this segment/hardware isn’t a corporate priority.

Thank you both, reconfirmed my needs and reevaluated available options and going to take the GB10 plunge.

From what we know, NVIDIA likely will not refresh GB10 systems until around 2027.

A big reason is that NVIDIA’s upcoming N1 laptop chips use the same silicon family as the DGX Spark / GB10, which suggests NVIDIA plans to support this platform for a while rather than replace it soon.

I also do not see much value in the DGX Workstation at its current price. If your priority is VRAM capacity, clustering multiple DGX Spark units can get you similar memory for significantly less money.

Main point would be running really large dense models on HBM or expand it with a pro 6000. And you still have a lot of memory and compute. It’s a Karpathy level machine.