I found that some float3 operations (sometimes) generate error messages under cuda-memcheck (e.g., +=, -=, (maybe) some float3 functions like dot()). I do not believe the generated code is in error, and these are spurious error messages from cuda-memcheck. Explicitly re-writing the operation removes the error (indicating an invalid write access to 4-byte word).
float div_v; float3 dr,dv;
div_v += dot(dr,dv); // cuda-memcheck error
div_v += (dr.xdv.x + dr.ydv.y + dr.z*dv.z); // OK under cuda-memcheck
This does NOT happen on all such statements, but it is repeatable when it does happen; so its likely related to what is around the offending statement.