glutSwapBuffers is delayed by cuda

when I use opengl with CUDA, using FBO to display the result, the glutSwapBuffers() function is delayed very much, the more the parameter grid’s x dimension of kernel call the more delay, why is this? is this a bug?