I have tested cudaMalloc’s speed.
cudaMalloc((void **) &xc, n * sizeof(float));
When I allocated 10 * sizeof(float) memory, the time the code above used was about 6e-3s, and Linux just used 4e-8s (malloc, free). cudaMallocHost/cudaFreeHost even took longer time than cudaMalloc/cudaFree.
The test was performed under Fedora 13, gcc-4.4, cuda 3.2 and Tesla C2050.
Do we have any method to improve these functions’ speed? Thanks.