Directcompute rewriting to TGSM

I have a directcompute shader using TGSM which does something like the following

groupshared float4 sm[64];
int main()

sm[thread_idx] = v;
// use sm value, everything is good here
sm[thread_idx] = v2;
// use sm values again, values are corrupted now

So the 2nd time I try to write and read again from the same TGSM as before, it somehow doesn’t work correctly. If I use a second TGSM area for the 2nd usage, everything works fine. I’ve read ( to try to write/read from TGSM only once, but nothing says it won’t work? Any thoughts? I am not bottlenecked by TGSM storage, but instead by register usage, so this is not a big problem (for now), but I would still like to know the reason why I cannot reuse a TGSM area in a shader. This is on cs5.0 on a GTX980Ti with the 361.43 drivers.


Ping. Any thoughts, wondering if it’s a driver bug or a spec limitation?