http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/VisualProfiler/computeprof.html
says compute level 2.0 (Fermi) warp_serialize is not supported.
Is there a replacement?
Thank you
Bill
ps: I have started using atomicCAS on shared memory and the code is a lot slower.
So far I have noticed inst_issued/gputime is 288/uSec, about half what it was before
the latest “improvements”.