GK110 supports a new cache config mode where the L1 and shared memory are split 32:32
references:
[url]Kepler Tuning Guide :: CUDA Toolkit Documentation
and slide 29 of:
[url]http://developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0514-GTC2012-GPU-Performance-Analysis.pdf[/url]
However the cuda reference manual only list the older 16:48 and 48:16 split using cudaDeviceSetCacheConfig or cudaFuncSetCacheConfig, page 23 and 52 respectively in the Toolkit Reference Manual.
How do I set the 32:32 split?
For the Runtime API:
/**
* CUDA function cache configurations
*/
enum __device_builtin__ cudaFuncCache
{
cudaFuncCachePreferNone = 0, /**< Default function cache configuration, no preference */
cudaFuncCachePreferShared = 1, /**< Prefer larger shared memory and smaller L1 cache */
cudaFuncCachePreferL1 = 2, /**< Prefer larger L1 cache and smaller shared memory */
cudaFuncCachePreferEqual = 3 /**< Prefer equal size L1 cache and shared memory */
};
… and for the Driver API:
/**
* Function cache configurations
*/
typedef enum CUfunc_cache_enum {
CU_FUNC_CACHE_PREFER_NONE = 0x00, /**< no preference for shared memory or L1 (default) */
CU_FUNC_CACHE_PREFER_SHARED = 0x01, /**< prefer larger shared memory and smaller L1 cache */
CU_FUNC_CACHE_PREFER_L1 = 0x02, /**< prefer larger L1 cache and smaller shared memory */
CU_FUNC_CACHE_PREFER_EQUAL = 0x03 /**< prefer equal sized L1 cache and shared memory */
} CUfunc_cache;