Memory leak, cudaMalloc, and cudaFree

Note: I’m pretty new at C++ so this might be a really easy answer.

I’m creating a code that does a simple image transformation. I am doing this process in a loop and have noticed that my code is leaking memory. I debugged it and found out the sources are the float* s I use in the cudaMalloc call (see code). When the kernel is finished I call cudaFree on the array and it frees the array on the GPU only. So each time I run the code it is creating these 2 sets of 3 arrays on the host and they are not being deleted. I tried adding delete and free calls on them (to remove the arrays on the host) but I get an Access Violation Error.

So what am I doing wrong here?

//THESE GUYS ARE THE PROBLEM

	float* tempin_gpu_r = new float[width * height];

	float* tempin_gpu_g = new float[width * height];

	float* tempin_gpu_b = new float[width * height];

	float* tempout_gpu_r = new float[width2 * height2];

	float* tempout_gpu_g = new float[width2* height2];

	float* tempout_gpu_b = new float[width2 * height2];

//Allocate memory on the GPU

	

	cudaMalloc((void **)(&tempin_gpu_r), (width * height) * sizeof(float));

	cudaMalloc((void **)(&tempin_gpu_g), (width * height) * sizeof(float));

	cudaMalloc((void **)(&tempin_gpu_b), (width * height) * sizeof(float));

	cudaMalloc((void **)(&tempout_gpu_r), (width2 * height2) * sizeof(float));

	cudaMalloc((void **)(&tempout_gpu_g), (width2 * height2) * sizeof(float));

	cudaMalloc((void **)(&tempout_gpu_b), (width2 * height2) * sizeof(float));

//Copy the image data from the host onto the GPU (you don't need to see this)

//Kernel Setup and Call (you don't need to see this)

	

//Copy the data from the GPU back to the host  (you don't need to see this)

cudaFree(tempin_gpu_r);

cudaFree(tempin_gpu_g);

cudaFree(tempin_gpu_b);

cudaFree(tempout_gpu_r);

cudaFree(tempout_gpu_g);

cudaFree(tempout_gpu_b);

need to add delete tempin_gpu_r;
If it does not work, loos like you spoiled memory.
Ah, you do not need to call operator new, only need float* tempin_gpu_r ;
you need to allocate memory twice on cpu and gpu with different pointers and copy memory back from gpu to allocated memory using cudaMemcpy.

I’m not sure I understand. If I add “delete tempin_gpu_r;” after my cudaFree calls, I get the Access Violation Reading error.

My GPU<-> host calls should be good:

cudaMemcpy(tempin_gpu_r, tempin_cpu_r, (width * height) * sizeof(float), cudaMemcpyHostToDevice);

...

cudaMemcpy(tempout_cpu_r, tempout_gpu_r, (width2 * height2) * sizeof(float), cudaMemcpyDeviceToHost);

cudaFree(tempin_gpu_r);

...

You are using same pointer to allocate memory on device and host.
new float[width * height];
it is memory allocation, remove it.
Where is tempin_cpu_r defined?