Multiple cufft plan generation in pthread environment gets serialized in time?

I am generating a number of pthreads each which establishes it’s own streamId and creates a cufft plan. The length of time it takes for each plan creation to complete grows by the order each thread runs. For example, the first thread takes ~400 msec to generate it’s plan, the second thread reports 800 msec, the third 1200 msec, etc. until all threads complete. This implies that even though these are separate threads the process to generate a cufft plan gets serialized so that the total time to complete is N * 400 msec. Is this correct and/or is there a method to have them created in parallel.