I found that some float3 operations (sometimes) generate error messages under cuda-memcheck (e.g., +=, -=, (maybe) some float3 functions like dot()). I do not believe the generated code is in error, and these are spurious error messages from cuda-memcheck. Explicitly re-writing the operation removes the error (indicating an invalid write access to 4-byte word).
This does NOT happen on all such statements, but it is repeatable when it does happen; so its likely related to what is around the offending statement.
Unfortunately, I can’t really post the application in a form you can use. Its large (1000’s of lines in total, with ~1600 lines of actual GPU code). I will test test again to see if it still occurs. I was using the beta version of 5.0 when this happened, so I should check the production version.