the programming guide says, they are stored in special registers.
but i dont think there will be much difference between using threadIdx.x everywhere or a dedicated int-variable, the compiler optimizes it the right way
Two minor notes, just to clear up some misconceptions:
Many architectures, include CUDA, can only operate on things in registers. Regardless of where threadIdx.x is stored, it will need to be copied to a register to be used in a calculation or as an array index. How long it stays in a register is up to the compiler.
Relating to the previous point, there is no one-to-one relationship between C variables and registers. The compiler (and PTX assembler in CUDA) decides what to store in a register and for how long. A single expression might require several registers to compute, but after the value is used, the compiler is free to throw it away and use the register for something else. If there are not enough registers to hold the number of simultaneously required values, then the compiler will push some of the values into “local memory”, which is just a section of global memory where storage can be set aside for each thread.