NVCreateHWEncoder failed in multithreaded applications!

I created a multithreaded application, and in each thread, I create a CUDA Encoder.

But, the call NVCreateHWEncoder usually failed (return E_FAIL), the more threads I use, the easier it fails.

2 threads : usually all threads successfully create an encoder
4 threads : half of the cases that all threads succeed
8 threads : in most cases, at least one thread will fail, but in some cases all the threads succeed.

The frame size is 704x480

I use GTX 470, 1280Mb GDDR5, driver 270.32, CUDA 4.0 RC.

I want to know how much resource it needs to create an encoder, why sometime 8 threads all succeeded, sometime 2 threads failed?

Thank you.

Can anybody help? External Image

External Image