Error message when stepping out of __global__ function in cuda-gdb

When I try to step out of a global function in cuda-gdb, I have the following error message:

(cuda-gdb) s
0x00002aaaac219110 in cuVDPAUCtxCreate () from /lib64/libcuda.so.1
(cuda-gdb) s
Single stepping until exit from function cuVDPAUCtxCreate,
which has no line number information.
cuda-gdb/7.12/gdb/infrun.c:2794: internal-error: resume: Assertion `pc_in_thread_step_range (pc, tp)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)

In my code, the first line of host code after the global function is cudaDeviceSynchronize(). When I backtrace the debugging process, this is what I get:

(cuda-gdb) bt
#0  0x00002aaaac219110 in cuVDPAUCtxCreate () from /lib64/libcuda.so.1
#1  0x00002aaaac219504 in cuVDPAUCtxCreate () from /lib64/libcuda.so.1
#2  0x00002aaaac11e65c in cudbgApiDetach () from /lib64/libcuda.so.1
#3  0x00002aaaac11e810 in cudbgApiDetach () from /lib64/libcuda.so.1
#4  0x00002aaaac052b5a in ?? () from /lib64/libcuda.so.1
#5  0x00002aaaac1a4a9d in cuCtxSynchronize () from /lib64/libcuda.so.1
#6  0x00000000005163ad in cudart::cudaApiDeviceSynchronize() ()
#7  0x000000000053b04d in cudaDeviceSynchronize ()

Does anyone know if this is a cuda-gdb bug or my own problem in the code? Thank you.

Can you show some code? Do you use any libraries?
Do you check for errors on every call to a CUDA function, like cudaMalloc()?
Also mention versions and platform.

Thank you for your offer to help. Sorry for my late reply.

As my original code is too large to share here, I tested on a simple code snippet:

#include <stdio.h>

using namespace std;

__global__
void kernel_func() {
	printf("In kernel func\n");
	return;
}

int main() {
	kernel_func <<<1, 1>>> ();
	cudaDeviceSynchronize();
	return 0;
}

The same error happened to this snippet as well:

Thread 1 "simp4gdb" hit Breakpoint 1, kernel_func<<<(1,1,1),(1,1,1)>>> ()
    at simp4gdb.cpp:7
7               printf("In kernel func\n");
(cuda-gdb) n
8               return;
(cuda-gdb) n
0x00002aaaac0763d0 in cuMemGetAttribute_v2 () from /lib64/libcuda.so.1
(cuda-gdb) n
Single stepping until exit from function cuMemGetAttribute_v2,
which has no line number information.
cuda-gdb/7.12/gdb/infrun.c:2794: internal-error: resume: Assertion `pc_in_thread_step_range (pc, tp)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)

Here is how I compiled this code snippet:

nvcc -x cu -g -G -Xcompiler -rdynamic simp4gdb.cpp -o simp4gdb

The CUDA installed on my server is CUDA 10.0.130. Device capability is 7.0. The OS is CentOS Linux release 7.6.1810, and the kernel version is 3.10.0-957.el7.x86_64.

I run CUDA on a remote server. Is there anything wrong with the cuda-gdb setting?