npp initialize very slow

Hi, I use npp library in cuda 6.0, I find when call the npp function first time, the performance is low. So I add a warm up at the beginning of program start up.
But recently, i find the warm up function which only include a nppi_add is very slow, it spend 40 to 50 seconds.
So I want to know when I call the npp function such as nppi_add , how the npp initialize internally.

Thank u.

Add more, it is happened when Win7 x64 os just start up. when the desktop is not yet completely initialized