The difference between Graphic Processing and General Purpose Processing

hnts03 · November 9, 2023, 7:11am

Hi. I have some questions about How GPU’s Graphic processing and General Purpose processing works
I’d really appreciate your response.

How GPU’s Graphic Processing pipeline works?
1-1) How Texture Unit works
1-2) How Texture Cache(L1 Cache) works
1-3) What functions ROP Unit have and how it works
1-4) When the RT Core is used and how it works
How GPU’s General Purpose Processing pipeline works?
2-1) LD/ST Operation case
2-2) Arithmetic/Logical/Bitwise Operation case
2-3) Atomic/Reduce Operation case
Do GPUs share any hardware units in each workflow? If you have hardware units that you don’t share, why they don’t share?

Curefab · December 30, 2023, 11:46am

That are a lot of questions, which each would fill a long article. AFAIK all the listed units are separately implemented in hardware with current Nvidia GPUs.

Greg · January 3, 2024, 2:01am

How GPU’s Graphic Processing pipeline works?

There are multiple sources for understanding the 3D graphics pipeline. The NVIDIA GPUs support a 3D graphics engine and multiple compute engines among other engines.

The 3D engine has a hardware pipeline with many generic and re-usable components such as the Streaming Multi-processors (SM) and memory sub-system. However, there is an actual pipeline that is accelerated via custom 3D units such as graphics specific work distributors, primitive distributor, pre-ROP, raster, and ROP units.

1-1) How Texture Unit works

Documentation can be found in numerous whitepapers, DirectX/Vulkan programming guides, and CUDA programming guide. NVIDIA does not disclose architecture specific details.

1-2) How Texture Cache(L1 Cache) works

There are several GTC presentations describing the details on the L1 Data Cache.

Requests, Wavefronts, Sectors Metrics: Understanding and Optimizing Memory-Bound Kernels with Nsight Compute

1-3) What functions ROP Unit have and how it works

The feature set of the ROP unit can be found in the Vulkan, DirectX, and OpenGL specification. NVIDIA does not disclose architecture specific details.

1-4) When the RT Core is used and how it works

The RT Cores are responsible for Traversal and Intersections. The RT Cores are co-processors of each SM/L1. NVIDIA provides ray tracing via DirectX Raytracing, Vulkan Raytracing, and OPTIX.

How GPU’s General Purpose Processing pipeline works?

GPGPU compute was initially implemented through the 3D graphics pipeline. CUDA architecture was invented to provide an architecture for using GPUs for data parallel compute. In modern NVIDIA GPUs General Purpose Processing is implemented through the compute engine.

2-1) LD/ST Operation case

See the link above on the L1 data cache. The SMs schedule and execute warps. A warp is a fixed group of 32 threads. Load operations can be dispatched to the constant caches, L1 data cache, shared memory, distributed shared memory, or texture unit. In all cases the instruction is converted into wavefronts/packets containing the list of load size, modifiers, and addresses or each thread. The memory unit performs the load operation and returns the data. Store operations are performed in a similar fashion. The link above provides more details on the stages in the L1 data cache.

The Hopper Architecture introduced a new unit called the Tensor Memory Accelerator for transferring large blocks of data efficiently between global and shared memory.

2-2) Arithmetic/Logical/Bitwise Operation case

The SMs supports hardware ALU data path that can execute common arithmetic, logical, and bitwise operations.

2-3) Atomic/Reduce Operation case

The SM Shared Memory Unit and the L2 Slices support ALU units for common atomic/reduction operations.

Topic		Replies	Views
ROP and gpgpu CUDA Programming and Performance	17	2476	November 14, 2017
I hope to know that, why GPU faster than CPU in Ge CUDA Programming and Performance	5	4241	December 28, 2007
Is GPU worth it? GPU currently too slow. CUDA Programming and Performance	16	6032	December 8, 2008
GPU vs. CPU GPU is always much slower CUDA Programming and Performance	1	10235	June 5, 2009
Does the GPU have a similar technology for L1 caches or will NVIDIA bring out similar technology in future? CUDA Programming and Performance	4	787	December 14, 2021
How to determine the number of running cores, i.e., the utilization of GPU computing resources CUDA Programming and Performance	5	2381	March 2, 2022
confused with some basic stuffs CUDA Programming and Performance	4	3268	August 29, 2007
Visionworks : how can I execute parallel node process in graph? Jetson TX1	13	2843	October 18, 2021
Latency and low-level performance questions CUDA Programming and Performance	10	4265	June 23, 2015
CUDA on G80 hardware questions... Mapping the execution model to hardware CUDA Programming and Performance	10	12408	April 10, 2008

The difference between Graphic Processing and General Purpose Processing

Related topics