I have a question it maybe too simple for some but I am confused.
I have written a program where I do CudaMalloc and it works fine but takes around 51 msec to complete. I have another program where I am using cusparse library to do the same task but here CudaMalloc takes much less time (5 msec) for the same data type, with the same size as previous. Why is it so?
When profiled I have noticed that in the (cusparse) second case its doing a CudaFree before CudaMalloc, but I have no statement for CudaFree at that point… Is it that the library is cleaning the memory before any operation is called?