GPU cache Optimizing for cachesize on Win7 and Leopard

dailydols · January 26, 2010, 9:02pm

I got a running a Meanshift Clustering code which is (on a Mac with Leopard LLVM) slower on the GPU then on the CPU.

I hoped to fix the problem with some cache optimizations using clGetDeviceInfo with the parameter CL_DEVICE_GLOBAL_CACHE_MEM_SIZE,
but unfortunately the function returns 0bytes for GPU cache.

On the CPU the optimization makes sense, because the function delivers the existing 2MB of cache of the CPU.

Don’t GPUs have a cache or is it just another missing (not yet implemented) feature?
As far as I know they have, even if it is just some kilobytes, which would help!

Greetz,

Konstantin

o.stava · January 26, 2010, 9:19pm

In the current generation of GPUs a cache is used only when accessing data from either the constant memory or from the image memory via a sampler. I think that the global memory cache that is queried by the above command refers to a cache used when you access the global memory using the buffer memory objects. In this case there is really no cache at all (see NVIDIA documentation). I think that I’ve read somewhere that some global memory cache might be included in the GPUs based on the Fermi architecture, but right now I would suggest to store your data in an image object and to access it using the sampler objects

dailydols · January 26, 2010, 10:20pm

Thank you!

I got no Fermi available (yet), so hopefully it is just another driver thing.
Let’s wait and see…

parallelis · February 8, 2010, 11:27pm

In fact there’s somtehing really close to CPU-cache, it’s the shared memory (implemented in nVidia GeForce 8xxx and later, as well as on ATI Radeon 56xx and over).

It’s 16KB of local memory shared by each group of 8 SP attached to a SM on GeForce architecture.
You should look at it on CUDA threads, and on CUDA documentation, this is not a cache, this is local memory that is far faster than the main videocard global memory, for latency and bandwidth.

Topic		Replies	Views
Using global cache with OpenCL CUDA Programming and Performance	0	1024	November 22, 2013
global memory caching CUDA Programming and Performance	4	1481	March 13, 2012
Questions about OpenCL-enabled CPU and memory CUDA Programming and Performance	1	2170	December 2, 2011
Is Global Memory Access Cached Or Not? CUDA Programming and Performance	3	2385	September 13, 2008
memory size how can i know the size of the different memories? CUDA Programming and Performance	6	6197	November 4, 2009
Global memory? Need to have Global Memory cleared up CUDA Programming and Performance	4	4976	April 19, 2007
About the different memories CUDA Programming and Performance	12	11920	December 6, 2007
how to interprete CL_NONE value CUDA Programming and Performance	3	1130	July 21, 2010
Total available memory on GeForce 8800GS CUDA Programming and Performance	2	1097	October 4, 2009
How my data is cached CUDA Programming and Performance	0	354	January 22, 2018

GPU cache Optimizing for cachesize on Win7 and Leopard

Related topics