Is there any relationship between posix semaphore and cudaMemcpy(pinned memory)

I’m doing some experiments about “producer-consumer problem” using posix semaphore.

Producer is writing something in pinned buffer and consumer is doing cudaMemcpy(device_pointer, pinned_buffer, buffer_size, cudaMemcpyHostToDevice).
It is no single error but problem is that bandwidth of each HtoD is relatively low.
My system’s usual pinned memory HtoD bandwidth is 12GB/s but now is 7~8GB/s.
I can’t even imagine what is reason of this problem.

Does anyone know about this? If so please tell me. Thanks.