Draw PBO into the screen : performance

Hi,

I’m currently using a PBO to draw into (from a CUDA kernel).

Once done I draw it the following way, but it use 20% of my CPU !!!

I’m looking for a more efficient way to render my PBO ?

glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, _pboId);

		// Disable all the stuffs we don't need here
		glDisable(GL_DEPTH_TEST);
		glDisable(GL_COLOR_LOGIC_OP);
		glDisable(GL_CULL_FACE);
		glDisable(GL_BLEND);
		glDisable(GL_DITHER);
		glDisable(GL_MULTISAMPLE);
		glDisable(GL_SCISSOR_TEST);
		glDisable(GL_STENCIL_TEST);

		glDrawPixels(_width, _height, GL_RGBA, GL_UNSIGNED_BYTE, 0);	// Use the CPU !!!
		glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, 0);

I have also try the following, but nothing appear :

glEnable(GL_TEXTURE_2D);
		glActiveTexture(GL_TEXTURE0);
		if (_pboTextureId < 0)
		{
			glGenTextures(1, (GLuint*)&_pboTextureId);
			glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
			glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
			glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, _width, _height, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
		}

		glBindTexture(GL_TEXTURE_2D, _pboTextureId);		// Bind the texture
		glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, _pboId);	// Bind the PBO

		// Copy pixels from pbo to texture object
		glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, _width, _height, GL_RGBA, GL_UNSIGNED_BYTE, 0);

		// Draw the texture
		glBegin (GL_QUADS);
		glTexCoord2f (0.0, 0.0);
		glVertex3f (0.0, 1.0, 0.0);
		glTexCoord2f (1.0, 0.0);
		glVertex3f (1.0, 1.0, 0.0);
		glTexCoord2f (1.0, 1.0);
		glVertex3f (1.0, 0.0, 0.0);
		glTexCoord2f (0.0, 1.0);
		glVertex3f (0.0, 0.0, 0.0);
		glEnd ();

		glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB,0);
		glBindTexture(GL_TEXTURE_2D, 0);

		glDisable(GL_TEXTURE_2D);

Does someone have an idea to solve this problem ?

Thx

The way with the texture should be the fast path.
Possible issues in your code:

  • Default glTexEnv is modulate. Try adding glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_REPLACE);
  • Default projection and modelview matrices are identity which matches a glOrtho() setup of the unit cube, so vertex xy-coordinates should be from -1.0f to 1.0f to fill the viewport.
  • Did you setup the glTexImage2D once before using glTexSubImage2D?
  • Your quad is clockwise, the default front face mode is counter-clockwise. It should render if you have culling disabled.
  • Using GL_RGBA8 as internal format and GL_BGRA as user format during the glTex(Sub)Image2D calls should be faster than GL_RGBA as user format because the data is in GL_BGRA8 format internally. (Not so for RGBA32F.)

I use something like this to display a tonemapped HDR buffer:

//One time init
  glGenBuffers(1, &m_pboOutputBuffer);

  // This happens during client window resize as well.
  glBindBuffer(GL_PIXEL_UNPACK_BUFFER, m_pboOutputBuffer);
  glBufferData(GL_PIXEL_UNPACK_BUFFER, m_width * m_height * sizeof(float) * 4, nullptr, GL_STREAM_READ); // Here: RGBA32F
  glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);

  // glPixelStorei(GL_UNPACK_ALIGNMENT, 4); // default, works for BGRA8 and RGBA32F.

  glActiveTexture(GL_TEXTURE0);
  // This code is using the immediate mode texture object 0. Add an own texture object if needed.
  glBindTexture(GL_TEXTURE_2D, 0); // Just use the immediate mode texture.
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE); // Don't sample the border color.
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

  glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_REPLACE); // Not a texture. default is modulate.

// Per paint event:
  glBindBuffer(GL_PIXEL_UNPACK_BUFFER, m_pboOutputBuffer);
  glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, (GLsizei) m_width, (GLsizei) m_height, 0, GL_RGBA, GL_FLOAT, nullptr);
  glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);

  // Some GLSL shader applied during texture blit. Put glEnable(GL_TEXTURE_2D) here without.
  glUseProgram(m_tonemapperProgram);
    
  glBegin(GL_QUADS); // CCW winding
    glTexCoord2f(0.0f, 0.0f);
    glVertex2f(-1.0f, -1.0f);
    glTexCoord2f(1.0f, 0.0f);
    glVertex2f(1.0f, -1.0f);
    glTexCoord2f(1.0f, 1.0f);
    glVertex2f(1.0f, 1.0f);
    glTexCoord2f(0.0f, 1.0f);
    glVertex2f(-1.0f, 1.0f);
  glEnd();

  glUseProgram(0);

Thanks a lot,

I have try your code (without the glUseProgram…) and have the same effect :-(

BTW:

  1. Yes, I have use glTexImage2D before (I will update the code right now)
  2. ModelView & Projection matrix are Identity
  3. I use glOrtho(0, width, 0, heught, -1, 1)
  4. In the begining I change the background color with:
    glClearColor(0.25f, 0, 0, 1); // Red background
    glClear(GL_COLOR_BUFFER_BIT);
    But at the screen the OpenGL zone is still red !

I still not facing why it does not work !

Thanks

I have finally a “white” zone… which should correspond to my “Quad”. Bit it is “white” and does not contains the colors of the PBO.

Any idea ?

In your initialization code you did not bind the texture before setting its TexParameters. Those are per texture object and you have left your texture object on default filter modes which have a min-filter of nearest_mipmap_linear, so need mipmaps to work and if you don’t provide any, the texture image is incomplete and the texture unit it’s bound to will be switched off.

Should have been:

glGenTextures(1, (GLuint*)&_pboTextureId);
glBindTexture(GL_TEXTURE_2D, _pboTextureId); // <== Bind the texture object!!
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); // These affect the currently bound texture object.
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, _width, _height, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);

Great,

I have forgot to bind my texture just after creation ! Stupid bug :-P

Why do you use a shader to render the texture ?

Just because my code was taken from a program rendering a high dynamic range image into an RGBA32F texture which needed to be tone-mapped for the final display and that’s done in the GLSL shader as a post-process. (I didn’t want to introduce errors when posting changed code without running it.)

Ahhh, I see

Thanks for your answer…

So, this approach works but still use 25% of my CPU :-(

In fact I render from an OpenCL kernel, on multiples GPUs. Then I would like to average each image (one image per GPU) and display it. Even without “averaging” the CPU is used a lot !

I have try several approaches:

  • glDraw…
  • Rendering directly into a texture and display it (as a Quad)
  • Rendering in a PBO, transfer it to a texture and display the texture
  • I have also play a lot with all the images format RGBA8 etc…

But no way… this continue to use 25% of my CPU :-(

I have just see the following thread :
https://devtalk.nvidia.com/default/topic/540366/opengl/opengl-vsync-swapbuffers-100-cpu-core/

Is it possible that the CPU use is due to the fact I “swap” in another thread ?

Nobody has an idea to help me ?

Thanks a lot

Very good question, added to my subscriptions.