Is transient freezing expected behavior?

alan.dang · October 26, 2025, 3:39pm

When loading a large model like gpt-oss-120b, about 80% of the way loading, the system will freeze and be unresponsive. The mouse cursor won’t move and if you have any animations in Firefox in the background, they also stall. After 15-20 seconds, it keeps going. (This is with 65K context window in LM Studio). Everything works fine after this halting.

If I don’t clear cache, running the multi agent chatbot will also cause the system to crash (but not recover, requiring system reboot).

I assume this has something to do with running out of memory, how quickly cache and virtual memory can be managed — but wanted to be sure I didn’t have bad LPDDR5x.

NVES · October 26, 2025, 4:49pm

Hi Alan, can you share the exact steps/frameworks you are loading and running inference on gpt-oss-120b? We want to see if we can locally repro and fix if necessary.

alan.dang · October 26, 2025, 5:42pm

For LMStudio freeze and resume

Load LM-Studio-0.3.30-2-arm64.AppImage
Load official openai/gpt-oss-120b model

context length 65536
GPU offload 36/36
CPU Threads 18
Offload KV Cache to GPU Memory (yes)
Keep Model in Memory (no)
Seed 94121
Custom Fields (High Reasoning Effort)

Runtime: CUDA 13 Llama.cpp 1.53.1

Not sure if any of the settings matter – that is just what I’ve changed from default.
—
2. The crashing will occur with the playbook instructions for dgx-spark-playbooks/nvidia/multi-agent-chatbot at main · NVIDIA/dgx-spark-playbooks · GitHub
if I have a lot of Firefox windows open and also do not clear cache before starting the docker containers that run everything.

Bibek · October 29, 2025, 4:23am

Hi Alan,

I don’t see any issue with your DDR.

While we work on reproducing your exact scenario, I wanted to share a best practice we recommend: clearing the cache before starting any Docker container. For example:

NOTE

DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:

Bash

Copy

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

Meanwhile, if you capture dmesg logs when you see this behavior, please do share here.

thanks

Bibek

alan.dang · October 29, 2025, 5:42am

[13786.069340] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1359
[13869.129129] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1359
[13877.074194] sh (220893): drop_caches: 3
[13924.293678] sh (221373): drop_caches: 3
[13997.970504] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1359

It does give you this error. The whole system doesn’t crash. If you clear the cache AFTER LMStudio loads, it is running about 100GB. If I eject the model, I’m running about 14GB at baseline.

When you watch it on DGX Dashboard, it seems to jump from 72.5GB all the way to 127GB and the dashboard even pauses. It doesn’t always lock up though for the full amount of time. If it clear the cache right before loading, the mouse cursor is responsive although you can see the dashboard freeze. Somehow it’s double.

Is there some sort of kernel patch that you can do where a NV_ERR_NO_MEMORY automatically drops the caches?

Bibek · October 29, 2025, 9:32am

Thanks for the quick Response Alan.

I believe the kernel log level is set to quiet and therefore you are not seeing the OOM Killer prints in the log.

Can you increase the log level using below command and then share the log.

dmesg -n 8

Or after reproduction, do a dmesg and share the log.

Another thing:

In your log we see, “sh (220893): drop_caches: 3”, That means drop_cache is run when the issue occurred, But it is not run before the launch of the model? Is that added by you or got enabled as part of the instructions you followed?

thank you

alan.dang · October 29, 2025, 2:35pm

Last night, I got two OOM errors with the pause in responsiveness, cleared the cache (which you see) and reloaded, and got no pause in responsiveness other than a stutter, and it generated the OOM error.

TODAY, I did sudo dmesg -n 8.

Dropped the cache first, and then loaded gpt-oss-120b. Didn’t lock but got more errors. Then unloaded the model, loaded something smaller, then reloaded the model. This time it DID lock for ~5 seconds but there was only one error in dmesg.

[44735.549239] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1359
[44997.271913] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1359
[44997.273761] NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from status @ kernel_graphics_context.c:1178
[44997.275550] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from kgrctxAllocMainCtxBuffer(pGpu, pKernelGraphicsContext, pKernelGraphics, pKernelChannel) @ kernel_graphics_context.c:1387
[44997.278153] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from kgrctxAllocCtxBuffers(pGpu, pKernelGraphicsObject->pKernelGraphicsContext, pKernelGraphics, pKernelGraphicsObject) @ kernel_graphics_object.c:214
[44997.442466] NVRM: rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d00e18; hParent=0xbeed0100; hObject=0xbeed3901; hClass=0x0000cd40; paramsSize=0x00000000; paramsStatus=0x00000057; status=0x00000057

There are actually two different behaviors.

It will always stall the system in terms of the blinking cursor in LM Studio or even the polling interval of DGX Dashboard. When I expect the freezing to happen ~80% of the model load, the steady linear increase in RAM usage will halt, and then jump up to 127GB.
Sometimes the mouse cursor will ALSO lock. When the mouse cursor is unresponsive, you don’t get extra dmesg messages

I can always generate the single line, even if I drop caches

[45822.801325] sh (597128): drop_caches: 3
[45859.583277] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1359

The more complex errors don’t seem to correlate with the locking up. Maybe it was able to avoid the locking up due more graceful error handling?

ibrunton_smith · November 18, 2025, 6:43pm

Did this ever get resolved? I have similar experience, even with context length at default of 4k.

ralph.lora · November 19, 2025, 6:54pm

Today’s updates solved this issue.

Topic		Replies	Views
LM Studio models overload freezes Spark DGX Spark / GB10	7	387	June 7, 2026
System crashes when memory is full DGX Spark / GB10	35	2480	June 15, 2026
My DGX Spark Hangs ... is this normal? DGX Spark / GB10 Projects llm , dgx	4	440	April 13, 2026
[Fixed] DGX Spark freezing and lockup issue; unable to load new models due to cache saturation DGX Spark / GB10	1	69	June 16, 2026
DGX Spark Shutdown around 95°C during nanoChat Pretraining (20-30 min) DGX Spark / GB10	21	1756	March 23, 2026
The DGX system itself takes up 20GB memory? DGX Spark / GB10 cuda	20	1566	November 23, 2025
DGX Spark stability / out of RAM / overheating DGX Spark / GB10 llama	27	1558	June 11, 2026
LMStudio Error: Cannot obtain free VRAM bytes for GPU0: NVIDIA GB10 DGX Spark / GB10	5	686	December 1, 2025
Memory Creep on DGX Spark: Where Your 128 GB Actually Goes (And How to Stop It) DGX Spark / GB10 jetson , nemotron	2	981	March 30, 2026
DGX Spark becomes unresponsive (“zombie”) instead of throwing CUDA OOM DGX Spark / GB10	16	1706	April 10, 2026

Is transient freezing expected behavior?

Related topics