[CUDA8.0 BUG?] Child process forked after cuInit() get CUDA_ERROR_NOT_INITIALIZED on cuInit()

I’m not aware of any change in behavior in CUDA in this regard. I’m definitely not a TF expert, but yes, I could imagine this issue impacting anyone using TF if they don’t follow the “rule”.

No, sorry, I don’t. You’re suggesting this is necessary:

    Parent Process (initializes CUDA)
       |                         |      
  child process1             child process 2

I don’t know why it cannot be refactored to:

    Parent Process (does not initialize CUDA)
       |                         |                          |
  child process1             child process 2      child process3
                                                initializes CUDA, 
                                                does whatever the parent process would have done.

and just as you would, use IPC for whatever process communication is needed. In fact you said as much yourself when you said:

That should be fine. Have a parent process that does not initialize CUDA. That parent process spawns any number of game processes, and also spawns a learner process to observe the games.

Anyway we don’t need to litigate this here. It’s entirely possible that there are things I don’t understand. Furthermore, I am not in control of CUDA behavior. Anyone desiring to see a change in CUDA behavior is welcome to file a bug.