Poor OpenGL rendering : software mode ?


When I turn on blending in one my rendering code, the rendering time jumps from 0.01sec to 5sec. Do you have any idea what can cause this ? Does the nvidia driver sometimes relies on software rendering ?

card: quadro4000
driver: 306.96
rendering target: fbo texture: GL_LUMINANCE_ALPHA32F_ARB


I am afraid this might be too little information to actually say anything. What are you rendering (number of ploys, polygon size on screen and so on, a brief description would go a long way), what resolution are you rendering, are you blending everything, are you using any depth tests, what color depth are you rendering, have you checked for overdraw, are there any shaders running (mostly interesting with the fragment shaders in this case). There is probably a lot more information that might be interesting; the more information the better. As of now, I can’t say anything about it, more than that a 500 times slower rendering time is pretty insane…

Good luck, and pleas post some more information.

The whole thing is very “sane”, all the numbers are reasonable:

  • What’s rendered is a depth map, the grid size is about 300x200. I am currently rendering it as GL_POINT (but this is equally slow when using GL_QUADS)
  • the whole thing is blend using glBlendFunc( GL_ONE, GL_ONE);
  • the FBO’s resolution is 512x512
  • no depth test
  • very simple shaders (some transform in the vertex shader, next to nothing in the fragment shader)

Interestingly, I have the option of rendering this using an auxiliary thread (using a dedicaed openGL context). Because the whole thing is so slow, I can actually render the texture attached to the FBO “real-time” while it is being rendered into. This is really funny, it does look like an old Amstrad game loading: the image seems to be rendered “line by line”. The lines seems to be the depthmap lines thought, not the FBO line. Please note the problem is also there when I am using just one thread (and no parallel visualization of the FBO).

Would you mind clarifying this one out for me: can the nvidia driver switch to “software rendering” mode at all in some corner case not covered by the hardware ?


A simple fragment shader, is that the same as in no texture samples and just outputting a single color? Or how many instructions and texture samples?

I still have a hard time understanding what you are rendering. A depth map? To me that is a texture, not something you draw with a GL_POINTS, maybe a single GL_QUADS if you want to do a full screen pass or something? (GL_QUADS, guess you are using OpenGL 2.2 ? Because as far as I know it was deprecated in 3.0?)

It might help if you could post image/images showing us what you are drawing and what you want to achieve.

Are you using one draw call with glDrawElements call with GL_POINTS, or multiple calls? (I am guessing one call with a count of 60 000?)

If there actually is 60 000 points being drawn, how many of them have overlapping pixels and how large is the point size?

Are you using any AA , like MSAA or SSAA?

The only think I can think of that can drastically increase the render time by just enabling blending is if there is a large number of overdraws. But I cannot imagine an x500 in time. Which brings me to, is there some other difference in what you draw when you get the 0.01 sec time? I guess you got that result when you used depth write and depth test, and no blending? What happens if you enable blending but keep depth write and depth test? How much overdraw do you get then? (Are the closest polygons drawing before the once further away? Because then it is not only the blend mode that change the time, it is also that a lot more fragments that gets processed (Early Z))

Sorry for not help that much, I’m mostly only asking a lot of questions ^^;

Sorry, I forgot to address a question:
“Would you mind clarifying this one out for me: can the nvidia driver switch to “software rendering” mode at all in some corner case not covered by the hardware ?”

I am mostly using the 3.2 version of OpenGL, and have tried to leave the older things behind me, and I never been using any of the software rendering stuff as I have always approached OpenGL from a game programming perspective where software rendering would be far to slow. I would however be greatly surprised if the driver were to automatically switch over. And as far as I know, just rendering GL_POINTS with a simple blending mode would not be any form of “corner case”?

What hardware are you using?

Thanks for your support !

I am using OpenGL version: 4.2.0. The fragment shader is outputing a single color. Each pixel of the depthmap texture is turned into a 3d points using simple geometric transformations in the vertex buffer. I use a single call to glDrawElements. Overlap is low, almost no self-occlusion, point size to 1. No AA of any kind.

Now the interesting bit:

Reading your reply inspired me to do some further testing. The slowdown seems to be caused by the internal format of the texture rendertarget: GL_LUMINANCE_ALPHA32F_ARB
I tried a couple of different type/format combination and it seems that the culprit is the format (GL_LUMINANCE_ALPHA), which is slow regardless of the type (GL_FLOAT or GL_UNSIGNED_BYTE). Any other format is fast (GL_LUMINANCE, GL_RGB and GL_RGBA)… Maybe this is the “corner case” ?

That is interesting, and good to know! ^^