Hello everyone, new guy here.
I’m having serious performance issues with my code when acquiring shared resources with OpenGL.
I have a kernel that does absolutely nothing:
[codebox]kernel void foo()
{
}
[/codebox]
Inside my display function I just do:
[codebox]void display (void)
{
static cl::KernelFunctor foo = foo_kernel.bind(queue, cl::NDRange(1,1), cl::NDRange(1,1));
queue.enqueueAcquireGLObjects(&shared_objects);
foo();
queue.enqueueReleaseGLObjects(&shared_objects);
glClear( GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT );
glutSolidTeapot(1.0f);
glutSwapBuffers();
}[/codebox]
The shared_objects vector contains a cl::Image2DGL declared as follows
[codebox]glEnable(GL_TEXTURE_2D);
glGenTextures(1, &shared_tex);
glBindTexture(GL_TEXTURE_2D, shared_tex);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F_ARB, TEXTURE_WIDTH,
TEXTURE_HEIGHT, 0, GL_RGBA, GL_FLOAT, NULL);
shared = cl::Image2DGL(context, CL_MEM_READ_WRITE, GL_TEXTURE_2D, 0, shared_tex);
shared_objects.push_back(shared);
glBindTexture(GL_TEXTURE_2D, 0);[/codebox]
Now, to draw the teapot (without acquiring the resources before and after the kernel execution), it takes around 3.61ms
However, when acquiring/releasing, times seem to be directly influenced by the size of the texture even if the latter is never used (either in OpenCL or in OpenGL)
Texture Size - Time (ms)
256 ~4.36
1K ~4.67
4K ~4.9
16K ~4.5
64K ~4.4
256K ~4.9
1M ~6.1
4M ~15.2
16M ~57.3
System Info: Nvidia GTX465 1GB (drivers: 258.96) on an Intel Core 2 Duo @ 2.66 GHz with 2 GB RAM
As you can see, after 1M texture size really slows down the application. What is the driver doing? Is it copying the data between memory regions?
Is it normal behavior or am I doing something wrong?
Thanks