I’m new to CUDA, and when comparing the run time of my CUDA sofwtare VS the old plain C version, I did not notice lots of improvement (it was actually slower). I first thought that this was because the matrix operations I do only include relatively small matrix/vectors, but I just found out that the operation that is taking most of the time is the deallocation of the memory !
I can take up to 10 seconds to free an array of 9000000 floats. However, it does this only for the first array, ie if I allocated several of them (allocation is fast, nearly instantaneous) and then free them, the first takes forever to free, and the others are lightning fast !
I allocate the memory using [font=“Courier New”]cudaMalloc((void**)&(storage->data), size * sizeof(float))[/font] and free it using [font=“Courier New”]cudaFree(storage->data)[/font];
What am I doing wrong ?
Thanks in advance for any advice/insight on what my cause this and/or how to solve it !