cudaEventRecord, cudaMalloc & cudaFree are locking and spinning the CPU during Kernel run

I have a very weird problem, on which I need some light… In my program I am setting the device flags to cudaDeviceScheduleYield and creating all event with cudaEventBlockingSync. According to documentation there should be no spinning, but when I am having a long run execution kernel, cudaMalloc, cudaFree and cudaEventRecord are freezing (at least that what the profiler is showing). It appears that the API is spinning on my CPU as one of my cores is at 100% when this thing happens. Is this supposed to happen? For the records I am running a multi-threaded application, and each thread might make independent parallel CUDA calls. Can this be the issue?