Turing performance regards integer shifts, popc, ffs

Looking at the throughput of integer operations here:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions

I am unsure where Turing falls regards integer shifts, popc, ffs. I understand that Turing specifically has improved integer performance to match it’s float performance - but is that just integer math such as add/subtract/multiply and not improvements for shift/popc/ffs etc? Also I noticed AND/OR/XOR dropped in throughput for Volta but I am not sure what the performance is like on Turing?

For context I just tried to use popc/ffs on Pascal and was disappointed with the performance, looked them up in the guides and was surprised their throughput wasn’t better (So I am trying an alternative using multiple 32bit integer adds/subtracts now). Also AND/XOR/OR are useful for avoiding conditionals sometimes so hopefully they have gone back up since Volta again?

I hope ffs/popc can get sustained much higher performance on future GPU’s going forward.