Double Precision Flags

Hi folks,

I wrote a code that uses the entire shared memory available (16 KB) which involves double precision values. For compiling it I have used the following option -arch compute_13. This compiles without a problem. But the issue arises when I add the flag -code sm_13 with the previous one. Then it throws up the following error.

“ptxas error : Entry function ‘_Z16ldldecompositionPd’ uses too much shared data (0x4008 bytes + 0x10 bytes system, 0x4000 max)”

Can anybody explain what is happening?

M.Meenakshi Sundaram

shared memory usage

(1) gridDim, blockDim, and blockIdx : 16 bytes

(2) parameters of kernel

(3) user defined, static + dynamic

you cannot used 16KB but less than 16KB a little bit

but why did it compile with only -arch… flag??

by the way, it did compile pretty cleanly with both the flags when I used lesser memory… But I still want to get to the bottom of it… So someone give me an idea please