What factors limit the speed of cudamalloc?

I am aware that cudamalloc is slower than malloc but would like to know why that is. Is it solely to do the communication speed between the CPU and Device? or have I misunderstood this. Many thanks