glVertex3f crashes when invoked in a large block within a list

Dear Folks @ NVIDIA,

i experienced a crash in paraview http://paraview.org.

i was able to make a simple OpenGL program that reproduces the issue.

Here is the snippet :

int i, n=98454384;
    float x = 0.5f, y = -0.5f, z = 0, delta = 100/(float)n ;
    fprintf(stderr,"start render\n");
    glClear( GL_COLOR_BUFFER_BIT );

    GLuint list = glGenLists(1);
    glNewList(list, GL_COMPILE);
    glBegin( GL_TRIANGLE_STRIP );
    /* add the first two points */
    glNormal3f( 1.0f, 1.0f, 1.0f);
    glVertex3f( x, y, 0.0f );
    glNormal3f( 1.0f, 1.0f, 1.0f);
    glVertex3f( -x, y, 0.0f );
    i = 2;
    while (i < n) {
        glNormal3f( 1.0f, 1.0f, 1.0f);
        glVertex3f( x, y, 0.0f );
        y += delta; i++;
        glNormal3f( 1.0f, 1.0f, 1.0f);
        glVertex3f( -x, y, 0.0f );
        y += delta; i++;
    }
    glEnd();
    glEndList();
    glCallList(list);
    glDeleteLists(list,1);
    fprintf(stderr,"stop render\n");

the very last call to glVertex3f will crash.

here is the gdb stacktrace :

Program received signal SIGSEGV, Segmentation fault.
0x0000003b80089c57 in memcpy () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install SDL-1.2.14-3.el6.x86_64 glibc-2.12-1.107.el6_4.5.x86_64 libX11-1.5.0-4.el6.x86_64 libXau-1.0.6-4.el6.x86_64 libXcursor-1.1.13-2.el6.x86_64 libXext-1.3.1-2.el6.x86_64 libXfixes-5.0-3.el6.x86_64 libXrandr-1.4.0-1.el6.x86_64 libXrender-0.9.7-2.el6.x86_64 libgcc-4.4.7-4.el6.x86_64 libstdc++-4.4.7-4.el6.x86_64 libxcb-1.8.1-1.el6.x86_64 mesa-libGLU-9.0-0.8.el6_4.3.x86_64
(gdb) bt
#0  0x0000003b80089c57 in memcpy () from /lib64/libc.so.6
#1  0x00007ffff68cfaf6 in ?? () from /usr/lib64/libnvidia-glcore.so.334.16
#2  0x00007ffff68cfbae in ?? () from /usr/lib64/libnvidia-glcore.so.334.16
#3  0x0000000000400fe3 in render () at ubug.cpp:59
#4  0x0000000000401076 in main (argc=1, argv=0x7fffffffe098) at ubug.cpp:75

when running the application with the DDT debugger and memory debugging option, i get an error message claiming memcpy is trying to write above an allocated memory area.

so far, i was able to reproduce the bug on RHEL6 like distro, x86_64 processor, NDIVIA Quadro K5000 and the latest available drivers : 331.49 and 334.16 (beta)

i can run the program without any issues with tigervnc and MESA implementation of OpenGL.
but when i run with a NVIDIA card, it crashes.

The full source code is below, in order to compile :
g++ -g -o SDL2 -DNOTIMER ubug.cpp -lSDL -lGLU -lGL

#include <unistd.h>
#include <SDL/SDL.h>
#include <SDL/SDL_opengl.h>

const int SCREEN_WIDTH = 640;
const int SCREEN_HEIGHT = 480;
const int SCREEN_BPP = 32;

bool initGL()
{
    glMatrixMode( GL_PROJECTION );
    glLoadIdentity();
    glMatrixMode( GL_MODELVIEW );
    glLoadIdentity();
    glClearColor( 0.f, 0.f, 0.f, 1.f );
    GLenum error = glGetError();
    if( error != GL_NO_ERROR )
    {
        printf( "Error initializing OpenGL! %s\n", gluErrorString( error ) );
        return false;
    }
    return true;
}

bool init()
{
    if( SDL_Init( SDL_INIT_VIDEO) < 0 )
        return false;
    if( SDL_SetVideoMode( SCREEN_WIDTH, SCREEN_HEIGHT, SCREEN_BPP, SDL_OPENGL ) == NULL )
        return false;
    SDL_EnableUNICODE( SDL_TRUE );
    if( initGL() == false )
        return false;
    SDL_WM_SetCaption( "OpenGL Test", NULL );
    return true;
}

void render()
{
    int i, n=98454384;
    float x = 0.5f, y = -0.5f, z = 0, delta = 100/(float)n ;
    fprintf(stderr,"start render\n");
    glClear( GL_COLOR_BUFFER_BIT );

    GLuint list = glGenLists(1);
    glNewList(list, GL_COMPILE);
    glBegin( GL_TRIANGLE_STRIP );
    /* add the first two points */
    glNormal3f( 1.0f, 1.0f, 1.0f);
    glVertex3f( x, y, 0.0f );
    glNormal3f( 1.0f, 1.0f, 1.0f);
    glVertex3f( -x, y, 0.0f );
    i = 2;
    while (i < n) {
        glNormal3f( 1.0f, 1.0f, 1.0f);
        glVertex3f( x, y, 0.0f );
        y += delta; i++;
        glNormal3f( 1.0f, 1.0f, 1.0f);
        glVertex3f( -x, y, 0.0f );
        y += delta; i++;
    }
    glEnd();
    glEndList();
    glCallList(list);
    glDeleteLists(list,1);
    fprintf(stderr,"stop render\n");
    SDL_GL_SwapBuffers();
}

int main( int argc, char *argv[] )
{
    if( init() == false )
        return 1;
    for ( ; ; ) {
        render();
        sleep(1);
    }
    return 0;
}

i was able to get a correct output by rewriting the render primitive.
basically, i use several and smaller OpenGL blocks :

void render()
{
    int i, n=98454384;
    float x = 0.5f, y = -0.5f, z = 0, delta = 100/(float)n ;
    fprintf(stderr,"start render\n");
    glClear( GL_COLOR_BUFFER_BIT );

    GLuint list = glGenLists(1);
    glNewList(list, GL_COMPILE);
    glBegin( GL_TRIANGLE_STRIP );
    /* add the first two points */
    glNormal3f( 1.0f, 1.0f, 1.0f);
    glVertex3f( x, y, 0.0f );
    glNormal3f( 1.0f, 1.0f, 1.0f);
    glVertex3f( -x, y, 0.0f );
    i = 2;
    while (i < n) {
            if (i == 499998) {
              i -= 2;
              y -= 2*delta;
              glEnd();
              glBegin( GL_TRIANGLE_STRIP );
              glNormal3f( 1.0f, 1.0f, 1.0f);
              glVertex3f( x, y, 0.0f );
              y += delta; i++;
              glNormal3f( 1.0f, 1.0f, 1.0f);
              glVertex3f( -x, y, 0.0f );
              y += delta; i++;
            }
            glNormal3f( 1.0f, 1.0f, 1.0f);
            glVertex3f( x, y, 0.0f );
            y += delta; i++;
            glNormal3f( 1.0f, 1.0f, 1.0f);
            glVertex3f( -x, y, 0.0f );
            y += delta; i++;
    }
    glEnd();
    glEndList();
    glCallList(list);
    glDeleteLists(list,1);
    fprintf(stderr,"stop render\n");
    SDL_GL_SwapBuffers();
}

That being said, this approach is virtually impossible to correctly implement within paraview.
so i hope this bug can be fixed soon.

Thanks and regards,

Gilles

I’ll forward this to the OpenGL driver team.

It is not a good idea to build primitives with such huge numbers (here ~98 million) of vertices per primitive in general.
That has disadvantages for building the display list, frustum culling primitives, transferring data to the GPU, and rendering. It’s really not going to perform optimally even if it worked.
Splitting the huge chunk into smaller primitives is actually a recommended technique for better performance. You picked one million which I would have used as well.

Does this mean your application is normally using immediate mode with this number of primitives and you call one of the render functions inside glNewList()/glEndList() to compile the data, so you’re unable to chunk it into multiple primitives?

Thanks for having forwarded this issue to the OpenGL driver team !

i understand this is not the most optimal way to do things …
unfortunatly, paraview is a pretty big application (~ 6 million lines, i wrote zero of them …)
and written in C++.

the “big” loop will call
vtkOpenGLPainterDeviceAdapter::SendAttribute(int index, int numcomp,
int type, const void attribute, vtkIdType offset/=0*/)
at least twice per iteration, and not directly.
index will choose the primitive to be called (glVertex, glNormal, glColor, glTexCoord, …)

i was able to patch this function, but this is really a ugly hack.
glEnd/glBegin can be inserted only every 4 vertices when drawing GL_QUADS,
every 3 vertices when drawing GL_TRIANGLES, every 3 vertices (but wait, we need to replay the last two vertices
after glBegin) when drawing GL_TRIANGLE_STRIP, …

so bottom line, i fully understand the benefits of building primitives with reasonable numbers of vertices,
but in the case of paraview, that might require some (heavy) re-engineering.

The problem with this reproducer is fixed and will be available in a future driver, but this way of sending one triangle strip with hundred millions of vertices is not generally solved. Sending even more vertices or more attributes per vertex will result in out of memory errors then.
Splitting the triangle strips into smaller primitives or using independent primitives would be the better approach.