Qwen3-Next AWQ 4bit vs FP8 vs NVFP4 on single spark

BTW, Flashinfer implementation allows to fit more context (actually 2x more into the context compared to FLASH_ATTN one), that’s for FP8 model.