which thread scheduling mode should I choose when using multi cuda encoders?

my program uses multi cuda video encoders(2,3,or more) to put its rendered image to multi h264 streams with different bitrate,the encoded image is read from OpenGL directly,I Create a CUDA context with "CU_CTX_SCHED_BLOCKING_SYNC " thread scheduling mode(just same as SDK’s sample),but from the doc’s description,this mode will block the OpenGL thread until encode task finished(I call “NVEncodeFrame()” API in OpenGL thread),this is a problem,especially to multi encoders,because my program generate image periodically,so all the tasks in OpenGL thread should be asynchronous then I can catch the best performance,but the “CU_CTX_SCHED_BLOCKING_SYNC” mode seems to breaking the rules, so which other mode should I use? AUTO,SPIN,or YIELD?

I had a other quesion: I found CudaEncAPI can set a cuctxLock to encoder object,so should I call “NVEncodeFrame()” in OpenGL thread,or a new thread?