Duh! Yes, that was the problem (and the solution). Binding the process to a specific cpu (cpu0 in my case) yields a constant value of 3200MB/s host->dev.
Thanks a lot!
Hrrm, that’s the problem with having too many resources (and not knowing how to use them properly).