[Question] Per-Thread Program Counters

Hi friends, I am studying about GPU programming and when I learn about GPU compute capability, there is the sentence in NVIDIA TESLA V100 GPU ARCHITECTURE document:
“The per-thread program counter (PC) that forms part of the improved SIMT model typically requires two of the
register slots per thread.”
Could you tell me why per-thread program counter requires two of the register slots per thread? Thanks.

I would suggest you place this question in the accelerated-computing/cuda/cuda-programming-and-performance forums, as this one is related to Networking / Cumulus Linux

1 Like

Thanks for your suggestion

A program counter is a pointer to 64-bit space. Registers are 32-bits each. To get a 64-bit pointer, you need 2 registers. Since the PC in this case is per-thread, each thread will need 2 registers to hold this PC, since it could be different for each thread.

1 Like

Thank you for your answer. It was really clear and I gained an understanding. Have a good day.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.