how to use 48k-shared memory in gtx480?

As is well known, the default size of shared memory in GTX480 is 48KB, with L1 cache 16K. But my app running on GTX480 with more than 16KB shared memory used causes errors like this :

uses too much shared data (0xa09c bytes + 0x10 bytes system, 0x4000 max)

The max value is 0x4000, just 16KB… Then I use CUDA Runtime Function–cudaFuncSetCacheConfig to force the shared memory to be 48KB for my kernel function. But of no use…

Is there anyone have encountered the same problem ?

Thanks… ^_^

As a first guess, try compiling with -arch=sm_20.

I’ve tried this, but it’s of no use… Thanks ^_^

Btw, was shared memory size per block increased? Or just total amout?

The shared memory per multiprocessor increases to 48KB, so we can use more in our block.

The correct way is to modify the CUFILES to CUFILES_sm_20 in your own Makefile…

I have a question. Are you running the 480 GTX as a X display as well? If so, stop the X server and try your cache preference again.

Hi I have exactly the same problem with you. How did you deal with it finally?

Thanks

Hi I have exactly the same problem with you. How did you deal with it finally?

Thanks

You need to compile with the flag -arch=sm_20

You need to compile with the flag -arch=sm_20

Thanks for the reply. But could you please give me some specific hints about how to change this flag in the <common.mk> file?

Thanks

Thanks for the reply. But could you please give me some specific hints about how to change this flag in the <common.mk> file?

Thanks

Yes, do not use the common.mk…

Write your own makefile, it is very simple and you have full control.

Yes, do not use the common.mk…

Write your own makefile, it is very simple and you have full control.

Hi mfatica,

I wrote a simple makefile as follow(not sure whether it’s correct). I did set the flag to -arch=sm_20 option, but when make, the problem is still there, showing that ptxas error : Entry function ‘_Z6kernelPcP5bit_tii’ uses too much shared data (0x8018 bytes + 0x10 bytes system, 0x4000 max)

NVCCFLAGS := -O3 -arch=sm_20

NVCC := /usr/local/cuda/bin/nvcc

LD_LIBRARY_PATH := /usr/local/cuda/lib64

all: main.cu Makefile

    $(NVCC) -o creatures main.cu $(NVCCFLAGS)

clean:

    rm -rf creatures

run: all

    ./creatures

include …/…/common/common.mk

Thanks

Hi mfatica,

I wrote a simple makefile as follow(not sure whether it’s correct). I did set the flag to -arch=sm_20 option, but when make, the problem is still there, showing that ptxas error : Entry function ‘_Z6kernelPcP5bit_tii’ uses too much shared data (0x8018 bytes + 0x10 bytes system, 0x4000 max)

NVCCFLAGS := -O3 -arch=sm_20

NVCC := /usr/local/cuda/bin/nvcc

LD_LIBRARY_PATH := /usr/local/cuda/lib64

all: main.cu Makefile

    $(NVCC) -o creatures main.cu $(NVCCFLAGS)

clean:

    rm -rf creatures

run: all

    ./creatures

include …/…/common/common.mk

Thanks

Try this NVCCFLAGS:

-O3 --ptxas-options=-v -arch sm_20

Do not include the common.mk, you are probably overriding the flags you just set.