what is new in CUDA technology that was abscnet in G7 and previous GPUs


I want to know what were the architectural changes in the previous generation GPUs G7 and others, that lead to the CUDA development. What features NVIDIA introduced in the GPU hardware in G8 and onward processor.


“unified” shaders is a new concept of G8 I think (and mandated by DirectX10). It must have forced them to make the processing more generic, and eventually led to CUDA as we know it.

Other than that, you better browse the various hardware new sites on the net. Often they have pretty well-informed articles about chip internals (leaked information and other “secrets”)


Technically, DX10 didn’t force any particular hardware design bur strongly encouraged general purpose shaders ;)

The three sentence summary goes something like this: DX9 cards (up to GF 7xxx) had a fixed number of pixel shaders and vertex shaders and this was reflected in the API (or rather, the other way around) - you had different instructions for vertex shader programs and pixel shader programs. Then came DX10 with its Unified Shader Model which attempted to have all kinds of shaders use the same instruction set and make them very similar to each other.

Now once you have the Unified Shader Model, in which there’s little difference between a pixel shader and a vertex shader, it kinda makes sense to introduce a Unified Shader Architecture in which you also unify the hardware that runs those shaders - you build a single shader processor that is versatile enough to do all kinds of shading, depending on how it’s programmed. And that’s one step from having fully programmable, general purpose processing units in the GPU. As a side effect, such architecture scales smartly as there’s never a situation when ex. all your vertex shaders are idling because a game uses pixel shading exclusively.

Okay, so those were five sentences.

Unified shaders were a natural progression, but I would say the main new features that the G80 architecture added for compute were global load and store (which allows a C-like language) and shared memory.

It’s worth noting that these were all NVIDIA’s innovations, and not required by DirectX at the time. They have provided the basis for CUDA, OpenCL and DirectCompute.

This interview with Ian Buck has some good history:

Thanks Simon, could you please tell me what do we mean by Global Load and Store AND how this feature allows a C-Like Language ? (shared memory I know wrt G80)

It used to be gather but not scatter. Every pixel would write at its position in the frame buffer.
Now every “pixel” or “thread” can write anywhere to the video card’s main memory.