packed sse-like math-funcs for float4/int4 etc

mikemoik · November 28, 2008, 2:07pm

hello,

i didn’t recover any packed math-funcs for vector data structures in the specs,
like for sse on the cpu.
this would be a great improvement especially for 3D calculations and projective
geometry.
i know that the ALU design must have 128bit registers. is that possible with
CUDA-ALU’s?

greets,
moik

Simon_Green · November 28, 2008, 2:17pm

See the FAQ. Current NVIDIA GPUs have a scalar architecture, so there is no advantage to vector types.

mikemoik · November 28, 2008, 2:28pm

ok.

are there plans in the future?

Simon_Green · November 28, 2008, 4:52pm

You can think of the multiprocessors as 32-wide vector units. This (excellent) paper may help explain things:
[url=“http://scyourway.nacse.org/conference/view/pap341”]http://scyourway.nacse.org/conference/view/pap341[/url]

alex_dubinsky · November 28, 2008, 7:05pm

GPUs made a wonderful innovation just a couple years back. They used to be SIMD for a long time, but someone very clever realized that you can turn the concept on its head and create SIMT. In SIMT you still have vector units, but each element of the vector is emulated to be an independent thread. As long as all threads inside the vector operate in lock-step, performance is just as great as with SIMD, but with far less coding effort. It’s a much more elegant solution than an autovectorizing compiler. If the threads in a vector wish to do different things, they may, with only a partial performance degradation (a much smaller degradation than if a compiler had to forgo vectorization entirely).

In CUDA, these concepts take the names of “warp”, “divergence”, “coalescing”, etc.

The additional hardware to turn SIMD into SIMT is not much, given the overwhelming benefits, and there’s not much reason to go back. The only downside is having to launch more logical threads.

Topic		Replies	Views
Vector operations, swizzle and macros in CUDA CUDA Programming and Performance	3	8952	May 20, 2009
SIMD on GPU CUDA Programming and Performance	6	18021	April 29, 2009
SIMD Versus SIMT What is the difference between SIMT vs SIMD CUDA Programming and Performance	15	26281	August 20, 2010
Why scalar processors? CUDA Programming and Performance	21	18707	June 26, 2009
SIMT ,SIMD,SPMD, CUDA Programming and Performance	2	18161	June 6, 2010
SIMT == SIMD? CUDA Programming and Performance	4	26212	April 3, 2009
Future support/extension of CUDA SIMD intrinsics CUDA Programming and Performance	4	2494	September 29, 2016
Significant speedup with vector types - why? CUDA Programming and Performance	7	8249	July 15, 2010
vector data types Speedup by Vectorizing CUDA Programming and Performance	11	6495	December 14, 2007
Where are Cg's vector operations in CUDA are vector operations completely missing CUDA Programming and Performance	3	10037	April 2, 2007

packed sse-like math-funcs for float4/int4 etc

Related topics