cudaFree not ending

My code is working fine but for some dimentionsin a lines that I have :

cudaFree(d_A)
cudaFree(d_x)

it is freezing in the cudaFree(d_A) and not ending.

How can I diagnose the problem?

I should mention that the size of memory for d_A is 32768 and for d_x is 2048.

That may be a blocking operation when d_A is in use by previously submitted work that has not completed. In that case the focus should be on understanding what is happening with the previously submitted work. If the cudaFree(d_A) never completes, it could be an indication of a stuck kernel/infinite loop.

The CUDA runtime API and associated library calls may also misbehave if you have host code stack corruption. This represents a defect in host code activity, and usual debugging techniques such as valgrind may be appropriate.

Otherwise for additional help here, I usually suggest a short, complete example that others can inspect and run.