Trtllm vs vllm performance /w gpt-oss-120b

dlewis.io · January 19, 2026, 12:49am

I am doing some testing with aiperf between trtllm and vllm and seeing some significant differences in token throughput: ~ 109 tps with vLLM and ~ 30 tps with trtllm. Wanted to see if I am missing anything in my trtllm config or is trtllm missing some GB10 optimizations that vLLM has?

Tested with trtllm 1.0.8rc6 and 1.0.8rc8. rc8 occasionally shows a CUDA invalid instruction, so I fail back to rc6. I’ve tried the configurations in the trtllm DGX Spark playbook, but it doesn’t make a difference.

docker run --rm --gpus all -e TIKTOKEN_ENCODINGS_BASE=/tmp/tiktoken_encodings -v ./tiktoken_encodings:/tmp/tiktoken_encodings --ipc=host --network host --ulimit memlock=-1 --ulimit stack=67108864 -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc8 trtllm-serve serve openai/gpt-oss-120b --port 8000 --backend pytorch --max_seq_len 131072 --max_batch_size 16 --free_gpu_memory_fraction 0.7 --trust_remote_code

Tested with vLLM 0.13.0 using the official vLLM 0.13.0 wheel mentioned on this forum.

raphael.amorim · February 19, 2026, 4:22am

TRT-LLM is not on par with vLLM on the spark right now. Latest numbers for GPT-OSS-120B and others Spark Arena - LLM Leaderboard

Topic		Replies	Views
TRT LLM for Inference with NVFP4 safetensors slower than LM studio GGUF on the Spark DGX Spark / GB10 tensorrt , llm , llama	9	1333	March 6, 2026
Setting up vLLM, SGLang or TensorRT on two DGX Sparks DGX Spark / GB10	16	2080	December 7, 2025
Install and Use vLLM for Inference on two Sparks does not work DGX Spark / GB10	159	5675	December 9, 2025
Dgx spark benchmark performance DGX Spark / GB10	16	2311	December 21, 2025
Run VLLM in Spark DGX Spark / GB10	156	14430	June 8, 2026
Inference best results on Spark - not llama.cpp not VLLM -> SGLand DGX Spark / GB10 llama	3	1258	January 11, 2026
TensorRT-LLM + nvidia/Llama-3.3-70B-Instruct-NVFP4 = 5 tok/s DGX Spark / GB10 llama	3	708	January 18, 2026
vLLM on GB10: gpt-oss-120b MXFP4 slower than SGLang/llama.cpp... what’s missing? DGX Spark / GB10	143	7667	February 24, 2026
Why Is GPT-OSS 120B So Much Faster Than Smaller MoE Models in vLLM? DGX Spark / GB10 jetson , gaming , nemotron	3	488	May 28, 2026
Issue with run gpt-oss-120b in vLLM Jetson Thor generative_ai	22	3301	October 18, 2025

Trtllm vs vllm performance /w gpt-oss-120b

Related topics