Memory management OpenCL vs. CUDA

Hi,

i was running some tests to measure the Performance of cudaMalloc and clCreateBuffer.
Therefore i was expreiencing, that the OpenCL methods are orders faster than the CUDA Methods.

(Factor 10+ for small buffers, even increasing since CUDA takes longer for large buffers, but OpenCL doesn’t)

First i believed in some measuring fault, but everything seems to be alright.

Why is cudaMalloc so much slower?