nvcc/ptxas unnecessary lmem loads/stores Bug in nvcc alias analysis/PRE stages

Be careful using that. ;) (that’s actually for other folks who find this thread, you guys probably know this)

There’s a crapload of stuff in 3.0 that decuda currently can’t handle.