I’ll try to answer as best I can without sharing exact project details. For gpt-oss-120b I was using their recommended sampling parameters and for Qwen3.5 I was also using their recommended sampling parameters (Qwen/Qwen3.5-122B-A10B-FP8 · Hugging Face). My use case involves providing around ~60K of text performing analysis on it with a specifically tuned prompt where the model is instructed to use the provided text as the context and not rely on general knowledge it has been trained on.
In specific cases with gpt-oss-120b (w/ high reasoning enabled) and Qwen 3.5 FP8 outputs, it identified key critical details I expected it to pick up on. Using the same exact code/sampling parameters and the intel int4 quant version, it missed them.
This is based on my own testing and your mileage may vary of course but I just wanted to flag it because I figured given the higher AI scores for this model vs gpt-oss-120b that moving down to int4 wouldn’t have a big difference, but it did in my own testing and use cases.
