in cuda code we have cudaMallocAsync and cudaFreeAsync which mean we dont need to allocate the size equal of the data size, we can decrease the size right ( reuse the data )
for example we got array 10 element but we can allocate the size with just 5 element and reuse it
can we?

You can always reuse an allocation (as long as you haven’t freed it.) Beyond that, I’m not sure what you mean by

If you allocate space for 5 elements, then you can store the first half of the 10 element array there, and later on store the second have of the 10 element array there.

You can do the same thing with cudaMalloc. So I may not be grasping your question.

For example, I have an array with 100 elements in cpu, I need to copy that array to GPU for computing something.

In normal way when use cudaMalloc, there’re 2 ways:

  1. allocate 100elements once

  2. allocate 50elements for computing and copy H2D then reuse that space to allocate 50 rest.

In second way, (not compare the speed of these ways) we just need to spend 50 * sizeof(int) space which is better than 100 * sizeof(int) space ( in first way) BUT in both ways we need to take 2 time unit to allocating 100elements ( example allocate 50 elements need 1 time unit)

So we can improve the second way by using cudaMallocAsync which mean we overlap the time to allocate elements. I think I just answer my question. Thks