I do not consider the current architecture to be a scalar design, but rather a compromise between a scalar and a vector design.
When people think of vector machines today they think of Intel SSE and maybe IBM Altivec where there is a separate unit that is explicitly issued instructions to operate on many data elements. The problem with these programming models is that they cannot handle control flow – all functional units in the vector unit always have to perform the same computation. There have been several proposals to solve this problem with predication where certain functional units in the vector unit are turned off depending on the value of a mask register.
For example consider a 4-way vector machine, the code:
a[4] = {1, 2, 3, 4};
b[4] = {1, 2, 3, 4};
c[4] = a + b;
can be handled by a standard vector unit. However, when you add control flow:
condition[4] = {0, 1, 0, 1};
a[4] = {1, 2, 3, 4};
b[4] = {1, 2, 3, 4};
c[4];
if( condition[4] )
{
c = a + b;
}
else
{
c = a - b;
}
A traditional vector unit cannot handle this. It can be handled by adding predication, which will be executed logically as
if(1) : a[4] = {1, 2, 3, 4};
if(1) : b[4] = {1, 2, 3, 4};
if( condition[4] ) : c = a + b;
if( !condition[4] ) : c = a - b;
In this case, you are performing different computation depending on the value of condition, but you are executing on a vector unit rather than running four scalar units in parallel. From what I have read, NVIDIA’s architecture works like this. There is a logical scalar programming model that is mapped onto a physical vector unit. They have a very novel method of handling back edges (loops) as well, which is probably too complicated for me to describe here.