device memory management

In my code, there are two consecutive cudamalloc(), both need large amount of device memory
for example,

cudamalloc(&ptr1,…);
cudamalloc(&ptr2,…);

if my videocard has enough memory for the first cudamalloc(), but not the second one, the execution will return some error.
But after I reduce the size of both cudamalloc() to very small, and gurantee that they can both fit in the device memory,
recompile and run the code again, it returns the same error.

Is it because after the first crash, cuda can’t collect the memory allocated to ptr1 automatically?
since the first run crashes at cudamalloc(&ptr2,…), it did not reach cudaFree for ptr1 yet.

After I reboot the system, everything is fine.