What's Uniform Register in Turing ?

I found a set of instructions operating on Uniform Registers in Turing binary ISA with category “Uniform Datapath Instructions”. What’s Uniform Register? A common register shared by all thread in a warp? or whole CTA?

At first glance, I thought it is a companion of independant thread scheduling. But I found only integer instructions are supported. I knew Turing added a new path for interger instructions from the whitepaper, so this is how it’s implemented? But the name seems confusing…

Well, I just want to know how these instructions work and how can we benefit from it~
Any information is appreciated~

Some description is available here:


The purpose of the uniform data path is to allow efficient scheduling of nearly-continuous floating point instructions which are interrupted by occasional integer instructions. In some circumstances, the integer operations may be moved to the uniform data path.

Thanks Robert, for answering this ancient thread~

Since you have picked up this topic again, I have a few more questions.

As stated in https://docs.nvidia.com/cuda/turing-tuning-guide/index.html#sm-scheduling:

The schedulers seem to have double issuing throughput than FP32 instructions (one scheduler for 64 core per SM/4 = 16 cores). AFAIK, Turing does not have dual-issue capability. Thus to make the uniform data path run simultaneously with FP instructions, the scheduler should issue a uniform integer instruction to one warp, and next cycle issue a FP32 instruction to another warp. Is this just what stated in whitepaper of Turing at page11: “The Turing SM supports concurrent execution of FP32 and INT32 operations”?

Does it also work for INT32 for ordinary registers, or just uniform registers?

Is this also the case for co-issuing instructions for memory load-store, branching and SFU?

Thanks very much~