How is SIMT “Single Instruction”?

I’m trying to understand SIMT and also how it differs from SIMD.

I’m wondering how SIMT is “Single Instruction” when:

  1. each thread has its own instruction address counter
  2. each thread can have an independent execution path

If it truly is single instruction what is the purpose of these two features? I feel as if I’m not understanding the subtle differences in the architectures.

Start by reading the programming guide:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#simt-architecture

I believe it almost certainly explains how each thread can have an independent execution path in a SIMT architecture.