I mean - is it enough to call cuInit at least once at the beginning of the application or each worker thread must call it before doing something with driver API ?
Once per process. (believe it or not lately I’ve been writing docs on exactly how contexts behave, so hopefully you’ll see that in the not-too-distant future)
Cool ! that part was really lacking :)