CUDA: How do I use L2 cache in Fermi?

I need to be able to transfer data between different blocks but without using global memory because it’s way too slow. In the NVIDIA Fermi card, there’s a new L2 cache that goes across the whole GPU. Is it possible to use this cache, and if it is, how?
Thanks

Hi Tom,

The L2 cache is hardware managed so you don’t have direct control on how the data is stored. I’ve been poking around the web to see if I can find details on how the L2 caching policy works, but haven’t found anything concrete. As best as I can tell, data in the L2 cache is visible to all executing blocks and I assume if the hardware will cache frequently used variables especially if used across blocks.

  • Mat

I posted a similar question in the NVIDIA forums: The Official NVIDIA Forums | NVIDIA. Apparently the only control you have to the L2 cache is through modifiers in PTX instructions. This can be done on the CUDA C compiler using inline PTX.

What I’m hoping is that by not going over the 768KB limit, every read/write I make to global memory will be done through the L2 cache. I’ll be frequently transferring data between blocks by writing to a global memory and then reading from that array with the other blocks. Is is wrong to assume that the L2 cache will always be used if I don’t go over 768KB?

Is is wrong to assume that the L2 cache will always be used if I don’t go over 768KB?

Unfortunately, I don’t know. Hopefully the NVIDIA forums can be more helpful with this one.

  • Mat