cudaFree not freeing up memory

Hello,
I have a question about cudaFree.
I have written some GPU code and it works perfectly fine on most computers with GeForce or Quadro.
However for certain GPUs I have, when running the exact same code, I discovered that it holds up all GPU memory even after cudaFree for all the device pointers, the cudaFree did not give out any error message.
I wonder what might be the cause of that and what are the potential things to look for when trying to find a solution? Would it be Windows WDDM problem? Thanks. I just want to have a clue on where to look for solutions.

Just found out the reason… The arrays I declared inside the kernel is too large… Which somehow would corrupt some GPUs’ memory, making cudaFree not working.

N.B. to above.
But the interesting thing was the whole kernel was working and gives the perfect result once. Then running the second time it won’t work because the memory is corrupted.