My current program is processing data really quick, but the cudaMemalloc is apparently taking 5 times the amount of time of the processing.
Is it possible to have the allocation only once at the beginning of a program, and then do multiple processing after that, using the same location?
For example, I’m trying to apply a calculation to multiple image files. They are all the same sizes, so I would like to allocate that size at the beginnning, and after that just copy the next images to taht same location.
I need to make CUDA compatible with our current library, so I can only write is as functions to be called from outside…(no main())
Currently I’m calling the function once to do allocation, save the pointer to the memory allocated, then pass it outside.
For the next images, I pass the pointer back in, skip the allocation, and use the pointer for cudaMemcopy. Apparently not giving me the right result.
Is this do able?
EDIT: nvm, apparently I made a mistake in the code!