Calling OpenGL functions from cuda context

Suppose algorithm generates an array of 2D vertices (polygons). For computing the quality of generated data it should raster those polygons into an image and then compute the difference between generated and the input images.

After vertices are generated, they are stored in GPU memory. But to draw them requires calling OpenGL functions which requires me to step out of the CUDA (GPU) context and call those functions from host (CPU). In fact those OpenGL functions are doing nothing except asking GPU to draw the image, and that the data he needs for that operation he owns already.

The question in general: “why my GPU requires a request from a separate device (CPU) to do his job (draw the image) from the data that he owns”

Imagine the situation when your CPU has to evaluate “sin(3.14)” and for doing that your CPU is required to make a call to GPU that will ask him back (CPU) to count that value.

Of course from the other side even if we could call some function like “cudaDrawPolygons(polygons)” it is clear that multiple kernel threads can’t run this function on parallel. And it should be something like __threadSyncCallShader(render) that would suspend CUDA and pass control over the GPU to another context without braking the previous one.