ROP and gpgpu

njuffa · November 13, 2017, 12:55am

You cannot get the full bandwidth of GDDR5 GPU memory, just like you cannot get the full bandwidth of a DDR4 system memory in a benchmark. Expect to max out at around 80% of the theoretical bandwidth. The rules for maximum bandwidth are basically: (1) All accesses coalesced (2) Each thread makes 128-bit accesses (best use of limited-depth load/store queue). The simple kernel below will do that (configure to taste e.g. blocks = 65520, treads/block = 128, len=100000000).

__global__ void zcopy (const double2 * __restrict__ src, double2 * __restrict__ dst, int len)
{
    int stride = gridDim.x * blockDim.x;
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    for (int i = tid; i < len; i += stride) {
        dst[i] = src[i];
    }
}

Note that various performance issues have been reported with GDDR5X memory (search this forum for details).

Topic		Replies	Views
Technical questions on GTX1080ti multiplication CUDA Programming and Performance	14	1917	November 11, 2017
GPU Perfomance How much GFlops??? CUDA Programming and Performance	27	37363	August 30, 2009
Putting the GPU at work CUDA Programming and Performance	21	20175	July 5, 2007
CUDA Use Cases run serial algorithms on composite data CUDA Programming and Performance	14	4492	October 24, 2008
Multiprocessors or Cuda Cores CUDA Programming and Performance	25	19639	July 5, 2011
Some advice needed pls Doubts we have, we're starting with CUDA programming CUDA Programming and Performance	16	4698	June 22, 2011
Modern GPU CUDA Programming and Performance	30	5663	April 11, 2016
What's new in Maxwell 'sm_52' (GTX 9xx) ? CUDA Programming and Performance	69	26917	December 23, 2014
Warp scheduling - have I got this right? CUDA Programming and Performance	17	12159	February 12, 2013
GPU architecture and CUDA kernel execution CUDA Programming and Performance	13	24849	September 6, 2009

ROP and gpgpu

Related topics