Profiler Interpretation of profiler results

From the cuda 1.0 documetnation,

I use the Driver API to solve a PDE on a 1024 x 1024 grid, and plots the results

by attaching a buffer to a OpenGL framebuffer.

The profiler gives:

method=[ basicKernel ] gputime=[ 51.968 ] cputime=[ 4325.000 ] occupancy=[ 1.000 ]

I have determined that the cputime is accounted for by the kernel + the openGL commands.

Here is my display routine. If I comment everything but runCuda,

the profiler returns:

method=[ basicKernel ] gputime=[ 51.360 ] cputime=[ 62.000 ] occupancy=[ 1.000 ]

which indicates that the cputime reported is all the OpenGL stuff. What is

confusion is that each line of the profiler output should be associated with a single

kernel invocation. But the cpu time reported seems to be related to the time

between successive kernel invocations. Can anybody shed any light on this?

Thanks.

Gordon

MY CODE

glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

    

    runCuda(); // kernel invocation

   // set view matrix 

    glMatrixMode(GL_MODELVIEW);

    glLoadIdentity();

    glTranslatef(0.0, 0.0, translate_z);

    glRotatef(rotate_x, 1.0, 0.0, 0.0); 

    glRotatef(rotate_y, 0.0, 1.0, 0.0); 

   // render from the vbo

    glBindBuffer(GL_ARRAY_BUFFER, vbo); 

    glVertexPointer(4, GL_FLOAT, 0, 0); 

   glEnableClientState(GL_VERTEX_ARRAY); 

    glColor4f(1.0, 0.0, 0.0,0.5);

    glDrawArrays(GL_POINTS, 0, mesh_width * mesh_height);

    glDisableClientState(GL_VERTEX_ARRAY);

   glutSwapBuffers(); 

    glutPostRedisplay();

could you give some more information how the cpu time is related to the time between the kernel calls?

What specifically do you require? After all, how could the CUDA time be

affected by the GL commands in the code I provided? If the CPU time

returned by the profiler changes to such a degree 51 -> 4000 simply

by adding the GL commands, then, the GL commands must be responsible

for the increase in time. They are the only commands (of significance)

between the successive calls to the cuda kernel (contained with run() ).

thanks,

Gordon

Maybe this topic will answer your question?