eglSwapBuffers deadlock in libEGL_nvidia.so 595.58.03 — lost wakeup due to TOCTOU on internal flag

Hey,

after switching to Ubuntu 26.04 i got random freezes (every 2-12h of work) and i suspected gnome/mutter to be the origin of it, but after some digging it looks like libEGL_nvidia has a bug.

i can’t tell since when this bug is in libEGL and how many ppl will have the same issues once they update to ubuntu 26.04 and using gnome, but it’s hard to imagine that my setup/work is so unique, that this bug will only apply to a small group of ppl in the upcomming months.

Here is a summary of my AI, that will hopefully help you guys reproduce and confirm it:


`libEGL_nvidia.so` in the 595.58.03 proprietary driver deadlocks inside
`eglSwapBuffers` when the GBM buffer pool is exhausted and the call enters
the internal buffer-wait path. The main thread blocks in `pthread_cond_wait`
on a condition variable whose sequence counter is still `0`, meaning the
signal was **never sent** — this is a lost wakeup, not a classical deadlock.

Disassembly shows the signal is gated by an unsynchronized byte flag
(`needs_signal` at offset `0x1f8` of an internal buffer-state object),
read by the signaler and written by the waiter without any locking or
atomic operations between them. Classic TOCTOU.

A ~190-line standalone reproducer (`egl-swap-repro-min.c`, attached) uses
only GBM/EGL on a **single output** and deadlocks within 2 iterations.
The faulting backtrace is offset-for-offset identical to two independently
captured freezes of GNOME Shell (Mutter) during normal desktop use.

## Environment

- **GPU:** NVIDIA GeForce RTX 3060 Ti (GA104)
- **Driver package:** `libnvidia-gl-595` version `595.58.03-0ubuntu2` (Ubuntu 26.04)
- **Kernel:** 7.0.0-12-generic
- **Display server (repro):** direct DRM/GBM, no compositor
- **Display server (original observation):** Mutter 50.0-0ubuntu3, GNOME Shell on Wayland
- **Monitor:** EIZO EV2785 @ 3840x2160 via DisplayPort

**libEGL_nvidia.so.595.58.03 SHA256:**
`fd065b0304401f57a2e9c2bd9e7766043a1ab8bfdad1aa78ee0cdc73f3b638c8`

All assembly offsets below are relative to this exact binary.

## Reproducer

Attached: `egl-swap-repro-min.c` (~190 lines, no vendor extensions, no
`eglSwapBuffersWithDamage`, no application-side threading, single output).

Build and run:

```bash
gcc -O0 -g -o egl-swap-repro-min egl-swap-repro-min.c \
    $(pkg-config --cflags --libs egl gbm glesv2 libdrm)

# call this via ssh, otherwise you will be without a shell 
sudo systemctl stop gdm   # needed to become DRM master
sudo ./egl-swap-repro-min /dev/dri/card1
```

The reproducer performs a one-time modeset on the first connected
output, then loops: render, `eglSwapBuffers`, `gbm_surface_lock_front_buffer`,
hold the BO for 2 iterations before releasing it back to GBM, queue a
non-blocking page flip. Holding two buffers exhausts the GBM pool and
forces the next `eglSwapBuffers` into the affected internal wait path.

**Expected:** Program completes 1000 iterations and prints
`Completed without deadlock.`

**Observed on 595.58.03:** Hangs at iteration 2 with the main thread
stuck in `pthread_cond_wait` inside `libEGL_nvidia.so`. From another TTY:

```bash
sudo gdb -batch -ex "thread apply all bt full" -p $(pidof egl-swap-repro-min)
```

Holding buffers across frames is valid usage — EGL 1.5 §3.10.1 specifies
that `eglSwapBuffers` blocks internally when no buffers are available.

## Backtrace at the Deadlock

Main thread, single-output reproducer, iteration 2 (full BT attached as
`egl-repro-single-output.bt`):

```
#0  __futex_abstimed_wait_common64 (abstime=0x0)             ← infinite wait
#1  __pthread_cond_wait_common
    cbuffer = {wseq = 0, cond = 0x..., mutex = 0x..., private = 0}
#2  ___pthread_cond_wait
#3  libEGL_nvidia.so +0xc0eec    ← acquire_lock, timeout = -1
#4  libEGL_nvidia.so +0xc0f65
#5  libEGL_nvidia.so +0x939e4    ← buffer wait loop
#6  libEGL_nvidia.so +0x966f6
#7  libEGL_nvidia.so +0xa19a6
#8  libEGL_nvidia.so +0xa2364
#9  libEGL_nvidia.so +0xa246e
#10 libEGL_nvidia.so +0x505b7    ← eglSwapBuffers entry
#11 main at egl-swap-repro-min.c:157   (iter=2, held_len=2)
```

Key evidence:
- **`cbuffer.wseq = 0`** — the condition variable has **never** been signaled
- **`abstime = 0x0`** — the wait path is unbounded
- Mutex owner = 0, lock = 0 — nobody holds the lock, nobody is signaling

This is a **lost wakeup**, not a classical deadlock. Something was
supposed to broadcast on this condvar and didn't.

### GPU state during the hang

Captured from a second TTY while the reproducer was stuck at iteration 2:

```
+-----------------------------------------+------------------------+----------------------+
|   0  NVIDIA GeForce RTX 3060 Ti     Off |   00000000:21:00.0  On |                  N/A |
|  0%   44C    P8             21W /  200W |     173MiB /   8192MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+
|    0   N/A  N/A           41319      G   ./scripts/egl-swap-repro-min            130MiB |
+-----------------------------------------+------------------------+----------------------+
```

GPU is in deepest idle power state (P8), 0% utilisation, 44°C. The
reproducer process is still known to the driver and still holds
130 MiB of GPU memory, but no work is queued or running. This rules
out any GPU-side hang, TDR, or command submission stall: the GPU has
nothing to do, because the CPU is stuck waiting for a wakeup that
never comes.

## Root Cause (Disassembly Analysis)

All offsets relative to `libEGL_nvidia.so.595.58.03` (SHA256 above).

### The wait loop (offset 0x939b0)

```asm
939b0: movzbl 0x170(%r15),%eax   ; check "buffer_ready" byte flag
939b8: test   %al,%al
939ba: jne    93b80                ; set → done
939c0: mov    0x38(%rsp),%rsi    ; load timeout from stack
939c5: test   %rsi,%rsi
939c8: je     93b80                ; NULL → exit
939ce: mov    0x90(%r15),%rax    ; load buffer state object
939d8: lea    0x320(%rax),%rdi   ; mutex = buffer_state + 0x320
939df: call   c0f20                ; acquire_lock(mutex, timeout, ...)
```

### acquire_lock (offset 0xc0f20)

```asm
c0f35: test   %rsi,%rsi            ; timeout parameter
c0f38: js     c0f60                ; negative → infinite wait path
c0f3a: ...                         ; non-negative → timed wait path exists
       ...
c0f60: call   c0ea0                ; → pthread_cond_wait (abstime = 0x0)
```

The caller passes a negative timeout, selecting the infinite-wait branch.
A timed wait would self-recover after the timeout even if the signal is lost.

### The signal function (offset 0x8fab0)

```asm
8fac7: movb   $0x1,0x170(%rbx)     ; set buffer_ready = 1
8fae6: lea    0x320(%rax),%rdi     ; same mutex at buffer_state + 0x320
8faed: call   c10e0                 ; pthread_cond_broadcast
```

This function exists and correctly broadcasts on the same mutex/condvar.
But it is **never called**, because of the gate:

### The TOCTOU gate (offset 0x96949)

```asm
96944: call   96dd0                 ; some operation completes
96949: cmpb   $0x0, 0x1f8(%rbx)    ; check "needs_signal" flag
96950: je     96a13                 ; if 0 → SKIP the broadcast entirely
96956: mov    0x90(%rbx),%rdx
9695d: movb   $0x0, 0x1f8(%rbx)   ; clear flag, then fall through to signal
       ...
96a81: jmp    8fab0                 ; → broadcast
```

The waiter sets the same flag before entering `cond_wait`:

```asm
968f6: movb   $0x1, 0x1f8(%rdi)   ; waiter SETS flag before cond_wait
```

### The race

```
Signaler path (buffer completion)      Waiter path (eglSwapBuffers)
─────────────────────────────────      ─────────────────────────────
  read  flag @ 0x1f8 → 0
  skip broadcast, return
                                         write flag @ 0x1f8 = 1
                                         pthread_cond_wait(mutex, NULL)
                                         ← waits forever, nobody left
                                           to signal this condvar
```

The flag at offset `0x1f8` is a plain byte. It is read by the signaler
and written by the waiter with **no mutex held** and **no atomic
operations** on either side. There is no happens-before relationship
between the check and the set. Classic TOCTOU.

### Why this can't be worked around from outside

The `pthread_cond_wait`/`pthread_cond_broadcast` synchronization is
entirely internal to `libEGL_nvidia.so` and invisible to the API caller.
No amount of external locking, call ordering, or buffer-management
discipline in the application or compositor can prevent this race —
both the check and the set happen inside the driver, on a flag the
caller cannot observe or synchronize against.

The single-output reproducer confirms this: it runs no threads of its
own, uses no damage regions, no multi-context, no vendor extensions,
and still deadlocks within 2 iterations on the very first invocation
that hits the buffer-wait path.

## Suggested Fixes

Any one of these would resolve the deadlock:

1. **Hold the mutex across both sides** — read and write `needs_signal`
   (offset `0x1f8`) under the same mutex used for
   `pthread_cond_wait`/`pthread_cond_broadcast`.
2. **Make the flag atomic** — replace the plain byte with
   `atomic_compare_exchange` or equivalent; closes the TOCTOU window.
3. **Use `pthread_cond_timedwait`** — the timed path already exists
   at offset `0xc0f88` and would self-recover even if a signal is lost.
4. **Unconditionally broadcast** — remove the `needs_signal` gate and
   always call `pthread_cond_broadcast` on buffer completion. Spurious
   wakeups are harmless and the condvar API explicitly allows them.

(1) or (2) are the principled fixes; (3) is a cheap defence-in-depth
fallback.

## Also Observed In The Wild (GNOME Shell / Mutter)

Before building the standalone reproducer I captured the same deadlock
twice in GNOME Shell on the same system during normal desktop use, on
dual 4K monitors. The `libEGL_nvidia.so` offsets are identical across
all three captures:

| Frame                          | Mutter #1 | Mutter #2 | Reproducer |
|--------------------------------|-----------|-----------|------------|
| +0x505b7 (eglSwapBuffers entry) | ✓ | ✓ | ✓ |
| +0xa246e                        | ✓ | ✓ | ✓ |
| +0xa2364                        | ✓ | ✓ | ✓ |
| +0xa19a6                        | ✓ | ✓ | ✓ |
| +0x966f6                        | ✓ | ✓ | ✓ |
| +0x939e4 (buffer wait loop)     | ✓ | ✓ | ✓ |
| +0xc0f65                        | ✓ | ✓ | ✓ |
| +0xc0eec (pthread_cond_wait)    | ✓ | ✓ | ✓ |

All three: mutex `owner=0`, `lock=0`, condvar `wseq=0`. The effect on
the desktop is a hard freeze of the compositor — the only recovery is
`killall -ABRT gnome-shell`, there is no self-healing.

Tracked downstream in Ubuntu as
[LP #2147648](https://bugs.launchpad.net/ubuntu/+source/mutter/+bug/2147648).

## Attachments

- `egl-swap-repro-min.c` — minimal single-output reproducer (~190 lines)
- `egl-repro-single-output.bt` — GDB backtrace from the reproducer
  (iteration 2, `held_len=2`)
- `gnome-shell-freeze-1775733505.bt` — Mutter freeze backtrace #1
- `gnome-shell-freeze-1775750470.bt` — Mutter freeze backtrace #2 (with
  Python-GDB mutex owner extraction)
- `nvidia-bug-report.log.gz` — standard NVIDIA bug report system dump

nvidia-egl-deadlock-attachments.tar.gz (22.9 KB)

nvidia-bug-report.log.gz (407.8 KB)

I have observed the same behavior on Mutter 50.0 and NVIDIA driver 595.58.03.

GPU: NVIDIA Corporation GB205 [GeForce RTX 5070]
Kernel: 6.19.11-1-cachyos

Options i found for me:

I’m undecided, pure frustration after so many hours of freezes, debugging session etc.

I’m also encountering this. I have an NVidia GeForce RTX 4070 Super, running driver version 595.58.03 on Arch Linux (package version 595.58.03-1). It started occurring after my most recent package update which updated the Gnome desktop version to 50 (nvidia drivers were already at the mentioned version at the time).

Just out of curiosity when you get one of these freezes what happens if you do:

ctl+alt+F1

If you get a login box where do you arrive when you login?

or any of these: ctl+alt+F2 ctl+alt+F3 ctl+alt+F4?

Nothing as I tried every combination to get a terminal, the client can’t tell the GPU to output any new frames as I understand this. But many thanks to the replies, I was going insane the last couple days.