I think this approach worth a try:
Can confirm, get around 50 tok/s pp
Jup runs here with 45- 60
The community vllm container by @eugr works fine for this model too (there is a recipe):