I have a directcompute shader using TGSM which does something like the following
groupshared float4 sm;
sm[thread_idx] = v;
// use sm value, everything is good here
sm[thread_idx] = v2;
// use sm values again, values are corrupted now
So the 2nd time I try to write and read again from the same TGSM as before, it somehow doesn’t work correctly. If I use a second TGSM area for the 2nd usage, everything works fine. I’ve read (http://www.nvidia.com/content/GTC-2010/pdfs/2260_GTC2010.pdf) to try to write/read from TGSM only once, but nothing says it won’t work? Any thoughts? I am not bottlenecked by TGSM storage, but instead by register usage, so this is not a big problem (for now), but I would still like to know the reason why I cannot reuse a TGSM area in a shader. This is on cs5.0 on a GTX980Ti with the 361.43 drivers.