Speed - Global memory access vs. bitwise operation

perry · October 23, 2011, 3:40pm

What is faster in Cuda 4.0, 2.0 device…

write / read global memory OR bitwise operation (shifts) during runtime (more than once… in loop). Operation will be done on 64bit numbers (or even “simulated” 128bit, by struct).

So far I am using global memory, but efficiency is realy poor. I have stack stored in global memory, but stack is only from 4bit numbers (well… in stack stored as char), so I thought, store stack as 64bit number and use logical operation instead of array access to global memory. If I use 64bit number as stack for 4 bit numbers, i got capacity 16, which is enough for me for most cases.

wlangdon · October 24, 2011, 7:53pm

Do you have to use global memory?

In the past I placed the stack in shared memory (each thread had its own stack).

http://www.cs.bham.ac.uk/~wbl/biblio/gp-html/langdon_2010_eurogp.html

Today I would be tempted to place each thread’s stack in local memory

and rely on top of the stack being in cache most of the time.

If that does not work: how deep does your stack have to be? If you only have

4 bits per item could you fake a stack by shifting a register or two

left-right 4 bits for each pop-push?

Bill

Dr. W. B. Langdon,

    Department of Computer Science,

    University College London

    Gower Street, London WC1E 6BT, UK

    http://www.cs.ucl.ac.uk/staff/W.Langdon/

CIGPU 2012 CIGPU-2012 WCCI-2012 IJCNN-2012, CEC2012

EvoPAR 2012 EvoPAR 2012 EvoStar track on Parallel Architectures and Distributed Infrastructures

EuroGP 2012 30 Nov

RNAnet http://bioinformatics.essex.ac.uk/users/wlangdon/rnanet/

A Field Guide to Genetic Programming

                   http://www.gp-field-guide.org.uk/

GP EM http://www.springer.com/10710

GP Bibliography http://www.cs.bham.ac.uk/~wbl/biblio/

Topic		Replies	Views
Which memory is used for the stack frame? CUDA Programming and Performance	6	2927	September 29, 2011
Local Memory and Global Memory It is about the speed between local memory and global memory CUDA Programming and Performance	1	1030	February 7, 2012
Global memory access CUDA Programming and Performance	2	759	August 10, 2016
What's the difference between CUDA stack and local memory? CUDA Programming and Performance	3	508	September 13, 2024
memory organization CUDA Programming and Performance	3	4335	March 10, 2008
Writing global memory 14 times slower than reading? CUDA Programming and Performance	6	10094	January 19, 2011
comparision: shared mem <=> global mem actually no difference CUDA Programming and Performance	6	7552	July 21, 2008
Texture vs. Global Memory CUDA Programming and Performance	4	2015	August 6, 2009
How to Access Global Memory Efficiently in CUDA C/C++ Kernels Technical Blog	7	644	December 5, 2019
Device memory VS Shared memory CUDA Programming and Performance	4	4138	September 22, 2008

Speed - Global memory access vs. bitwise operation

Related topics