Turing - accessing 64KB shared mem from PyCuda (driver api)

eSkape · September 24, 2019, 9:24am

With Cuda 10.1 + Pycuda and a 2060 Super, I’m trying to force access to the additional available shared memory i.e overcome 48 KB per threadblock limitation, and obtain access to 64KB shared memory.

I understand that this can be done using the Runtime API.

But so far (in my relative inexperience) have been able to deduce how it be done:
A) With the Driver API
B) Using PyCuda (which uses the Driver API)

To try to be more clear, I believe that I am dynamically allocating the shared memory:

This snippet of kernel code (featuring 48KB shared memory) compiles fine:

__shared__ uint smem[12288];

…but this does not:

__shared__ uint smem[12289];

(this being 4 Bytes beyond 48KB)

Robert_Crovella · September 24, 2019, 1:48pm

That is not a dynamic memory allocation for shared memory.

A dynamic allocation looks like this:

extern __shared__ uint smem[];

and it requires a kernel execution configuration argument to specify how much shared memory (in bytes) should be provided.

You’ll want to set a device attribute on Volta and Turing to provide more than 48KB of shared:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared-memory-7-x

That should be all that is needed.

Topic		Replies	Views
Max shared memory CUDA Programming and Performance	0	1274	July 28, 2020
Shared memory size per Thread Block CUDA Programming and Performance	2	6854	May 17, 2019
Question about max shared memory in block and multiprocessor CUDA Programming and Performance	2	1444	February 20, 2024
Dynamic shared memory calculated by ncu larger than Max_shared_memory_per_block Nsight Compute cuda	4	662	September 21, 2023
Shared memory problem of above 48 KB requires dynamic shared memory? CUDA Programming and Performance	4	1904	June 17, 2021
Default value of max dynamic shared memory CUDA Programming and Performance cuda	8	103	December 23, 2024
How to allocate more than 48KB shared memory on A100? CUDA Programming and Performance	3	1007	April 29, 2023
Can't enable Shared Memory CUDA Programming and Performance cuda , kernel	1	291	November 2, 2023
Size limit on dynamic allocated shared memory CUDA Programming and Performance	2	1480	November 6, 2008
Shared memory alternative CUDA Programming and Performance	7	2446	December 7, 2011

Turing - accessing 64KB shared mem from PyCuda (driver api)

Related topics