Hi,
i was running some tests to measure the Performance of cudaMalloc and clCreateBuffer.
Therefore i was expreiencing, that the OpenCL methods are orders faster than the CUDA Methods.
(Factor 10+ for small buffers, even increasing since CUDA takes longer for large buffers, but OpenCL doesn’t)
First i believed in some measuring fault, but everything seems to be alright.
Why is cudaMalloc so much slower?