i’m having a lost of problem running my GPU code under windows 7. Under linux I can get easily 45Gflops on my Tesla (with is good for this application…) but on windows i’m stuck at 7Gflops.
I use a lot of small kernel (i know this is bad …) and i read something about WDDM which increase latency at each kernel call (i read 40us instead of 3us !!)
Any clue, anything possible to speedup windows execution ?
Thanks a lot !