CUDA Hidden Memory Leaks

I am working with a fairly large CUDA program which would slow down slightly after every execution. Looking through the code I found several instances where an array made with cudaMalloc was not freed. After freeing these arrays, the program slows down much less after each execution, but still to a small degree. I was wondering if:

  1. There is any memory allocated by CUDA or CUFFT API calls which may need to be manually freed.
  2. There is a way to automatically free GPU memory so that any missed cudaFree won’t slow down the program.