I am surprised that I only achieve a maximum of approx. 12t/s with the mxfp4 version via the llama.cpp WebUI (via llama-bench I get 59t/s, tg128 @ d4096). With the Q8-XL version, I get 54t/s via llama-bench and via the WebUI (and continue.dev).
my cml (I use the longer cml arguments because they are easier for me to understand even after some time has passed).
/home/cjg/Projekte/01_llama.cpp/llama.cpp/build/bin/llama-server -hf unsloth/gpt-oss-120b-GGUF:Q8_K_XL --alias “gpt-oss-120b|Q8-XL” --jinja --gpu-layers 999 --ctx-size 128000 --host 0.0.0.0 --port 51011 --flash-attn 1 --batch-size 2048 --ubatch-size 2048 --no-mmap --log-file /home/cjg/.cache/llama.cpp/log/llama-server.log --log-timestamps --log-verbosity 3
Is this also the case for you, or is my cml incorrect?