Try building as I suggested above. Custom build of spark-vllm-docker with the sliding window PR plus FlashQLA mod on board.
I was lukewarm on DFlash (still feel it won’t leap ahead until DDTree arrives), but on my real workloads for challenging text DFlash with this build now matches MTP=3 at context out to 30k+ for the Unsloth NVFP4 and PrismaQuant versions.