segmentation fault for kernel launch

we have used both cudamalloc and cudamallocmanaged for writing different kernels, also we have used cuda streams and shared memory in our kernels.Now the problem is once we launch a new kernel comes the segmentation fault.Any suggestions highly accepted.
Build with compile option “-g”, and no optimizing. Run in gdb, and get a backtrace after crash. Or use strace and look at about the last 100 lines of output (there will be a LOT of output which can be ignored at the start) for system call hints.