This is NVIDIA being intentionally vague, and thus more technical marketing than information useful to programmers. E.g. Intel documents (or at least used to document, they have become more secretive as well when it comes to micro-architectural details) how each instruction maps to the available five or six issue ports, and what functional units are provided at each port.
Why are vendors being secretive about details of their processor implementations? They think it hurts their competitive position and they do not want the competition to know what their “secret sauce” is. With the effective end of Moore’s Law, the competitive battleground over the next few years will be microarchitecture, so I would expect vendors to become more guarded than ever.
In the absence of detailed information, all we can tell from the language above is that there is some amount of dual-issue capability between integer and floating-point instructions on Turing, but we don’t know what restrictions apply. So without targeted experiments, the answers to the OP questions would appear to be
- On some GPU architectures, with unknown restrictions
In practical terms, I would suggest to simply try to exploit potential parallelism (beyond the mix of integer and floating-point instructions that falls out of the compiler naturally) for improved performance. If you achieve success, report back here to save fellow programmers some work :-)