Global memory access

Etayson · August 9, 2016, 8:43am

Hello. Tell me how to do better. In my kernel each half-warp load data from his 128 byte segment in global memory.Array in global memory aligned to 128b. Data 64 bit.
Those. for once, I want that every half-warp take 128 bytes at 1 transaction.
It’s real? Or better take first 64 bytes (16 threads on 4 bytes), and then take next 64 bytes. It will be faster?

CudaaduC · August 9, 2016, 10:54pm

This post somewhat addresses the issue;

[url]https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-increase-performance-with-vectorized-memory-access/[/url]

in general if you can have each thread load a 128 bit segment(16 bytes) then this will usually be faster than a 32 bit(4 bytes) or 64(8 bytes) bit word per thread.

For 64 bit number you can perform vectorized loads of the double2 type for floating point or ulonglong2 type for unsigned integer.

Etayson · August 10, 2016, 7:08am

Thank you very much for helping

Topic		Replies	Views
Reading from global memory to registers in a fast way CUDA Programming and Performance	10	2342	November 15, 2021
Data load question CUDA Programming and Performance	3	99	December 18, 2024
efficient global memory access 32-, 64- or 128-bit loads ? CUDA Programming and Performance	9	4913	January 7, 2008
Global Memory access pattern in doc CUDA Programming and Performance	3	6732	November 21, 2010
How to Access Global Memory Efficiently in CUDA C/C++ Kernels Technical Blog	7	725	December 5, 2019
Global memory bandwidth profiling? CUDA Programming and Performance	1	772	November 14, 2011
Will compiler optimise these memory accesses CUDA Programming and Performance	3	785	July 11, 2013
Global Memory Coalescing on Devices with Compute Capability 1.2 and Higher CUDA Programming and Performance	3	698	June 4, 2015
Question regarding transfer from global to shared memory CUDA Programming and Performance	5	6069	November 27, 2010
Why do I never get 128 bit reads from global memory? CUDA Programming and Performance	1	1203	November 13, 2012

Global memory access

Related topics