Vector operations, swizzle and macros in CUDA

Jamie_K · May 19, 2009, 8:04pm

Whether the GPU is a SIMD processor that operates on vectors, or multiple scalar threads running concurrently is somewhat a matter of perspective. It’s described in the programming guide as SIMT (single instruction multiple thread) which is somewhat like SIMD except that the threads are thought of as scalar threads.

But regardless whether you consider it like SIMD, the way it’s generally used is different. Instead of having a vector contain a single float4, and say adding two 4-element vectors in a single instruction like C = A + B, what’s more common is to have each thread be responsible for a different point. Then it takes 4 instructions to add 32 4-element vectors: C[id].x = A[id].x + B[id].x and C[id].y = A[id].y + B[id].y and C[id].z = A[id].z + B[id].z and C[id].w = A[id].w + B[id].w (where id is a thread ID that ranges from 0 to 31). So these 4 instructions produce 32 resulting float4 outputs because the 32 threads run concurrently. It’s data-parallel, but not parallel on the level of floats within a float4. It’s not parallel across R, G, B, A within a pixel, but rather parallel across pixels. Or across vectors or nodes or “items” depending what your application is. If that makes any sense.

So every instruction is a vector instruction in the sense that if all threads are doing the same thing, then it executes in all 32 threads in a single instruction. If threads within a warp are not executing the same instruction, the hardware can run the threads sequentially to maintain correctness of the threaded model, but this is avoided as much as possible because performance deteriorates quickly.

Topic		Replies	Views
SIMD on GPU CUDA Programming and Performance	6	17845	April 29, 2009
Do CUDA cores have vector instructions? General Topics and Other SDKs	0	1021	January 19, 2018
SIMT ,SIMD,SPMD, CUDA Programming and Performance	2	17436	June 6, 2010
Vector operations in cuda? CUDA Programming and Performance	3	24773	May 8, 2007
CUDA vs CPU and how to connect them ? CUDA Programming and Performance	5	6051	September 2, 2007
Hardware accelerated vector operations? CUDA Programming and Performance	1	3583	May 1, 2009
Why scalar processors? CUDA Programming and Performance	21	18293	June 26, 2009
Double Precision Units in Kepler? Legacy PGI Compilers	2	2517	February 23, 2016
CUDA execution mapping onto GPUs CUDA Programming and Performance	0	2821	March 2, 2009
Vector maths on float2, where are the SIMD functions? CUDA Programming and Performance	4	3285	July 9, 2018

Vector operations, swizzle and macros in CUDA

Related topics