Cuda 7.5 give a 30% performance loss vs cuda 6.5

Yes, the Groestl implementation is impressive. The optimalizations I have done is just removing assembly instructions. The algorithm is intact, I just do it with fewer instructions. I also removed some conditional code that was slower. And changed launch bounds. still using 64 regs… But in quark, more work has been done in the other algos. uint2 rewrites of all routines that use the 64 bit rotates, Blake, skein,keccak etc etc. And register tuning for maxwell. Quark was already pretty fast in ccminer 1.2, but now it is faster…

The gtx 970 is mining quark 850% faster than a r9 280x (sgminer opensource) sp-mod release 63.