How do you debug to see if your application run out of local/share/global memory?

My application seems to run file for small problem size, but always segfault/out of bounds when the problem size gets significantly large. What tools would you recommend to check which memory the GPU runs out? Whether it’s per-thread, shared or global?