gpgpu_fluid on 8800

I just got a 8800 GTX and installed it on Windows XP x64. All the old demos run fine, with the exception of the gpgpu_fluid demo. I’ve tried installing various driver versions with no luck.

Is the gpgpu_fluid demo deprecated for the NV80? Is it a driver bug? Am I doing something wrong? It ran fine under 64 bit XP with a 7900.

I realize there is a new CUDA demo that does (almost) the same thing, but I wanted to see how much faster it ran relative to the 7900.

The beta release does not support 64bit operating systems. Does this failure persist with 32bit WinXP?

Yes, I’m aware CUDA is unsupported in 64 bit XP, which is why I tried the gpgpu_fluid demo, which only uses Cg. Cg is also unsupported for 64 bit systems? This seems unlikely.

Hi Ted,

That’s my demo… sorry about this bug! Unfortunately there was a bug in the old version of gpgpu_fluid and we haven’t put it into SDK10 for OpenGL yet (I’ve been too busy with CUDA!).

I will try to get a new version out soon. In the meantime, here are some changes you can make that I think should fix it.

In the file “flo.cpp”, in the Flo::Initialize() function, replace this:

   _pScalarBuffer = new PBuffer("r float=32f depth=24");

With this:

   if (glutExtensionSupported("GL_NV_gpu_program4"))

        _pScalarBuffer = new PBuffer("r float=16f depth=24");

    else

        _pScalarBuffer = new PBuffer("r float=32f depth=24");

In “streamopGL.h”, in the BoundaryGLComputePolicy::Compute() function:

Change this:

// left boundary

      if (_bLeft)

      {

        glMultiTexCoord2f(GL_TEXTURE1, 

                          _bLeftPeriodic ? _iTexResS - 2 * _rTexelWidth : 

                                           _rTexelWidth, 0); // offset amount

        glTexCoord2f(_rMinS, _rMinT); glVertex3f(_rMinX, _rMinY, _rZ);

        glTexCoord2f(_rMinS, _rMaxT); glVertex3f(_rMinX, _rMaxY, _rZ);

      }

      // right boundary

      if (_bRight)

      {

        glMultiTexCoord2f(GL_TEXTURE1, 

                          _bRightPeriodic ? -_iTexResS + 2 * _rTexelWidth : 

                                            -_rTexelWidth, 0); // offset amount

        glTexCoord2f(_rMaxS - _rTexelWidth, _rMinT); 

        glVertex3f(_rMaxX - _rPixelWidth, _rMinY, _rZ); 

        glTexCoord2f(_rMaxS - _rTexelWidth, _rMaxT); 

        glVertex3f(_rMaxX - _rPixelWidth, _rMaxY, _rZ);

      }

To this:

// left boundary

      if (_bLeft)

      {

        glMultiTexCoord2f(GL_TEXTURE1, 

                          _bLeftPeriodic ? _iTexResS - 2 * _rTexelWidth : 

                                           _rTexelWidth, 0); // offset amount

        glTexCoord2f(_rMinS, _rMinT); glVertex3f(_rMinX + 0.5f * _rPixelWidth, _rMinY, _rZ);

        glTexCoord2f(_rMinS, _rMaxT); glVertex3f(_rMinX + 0.5f * _rPixelWidth, _rMaxY, _rZ);

      }

      // right boundary

      if (_bRight)

      {

        glMultiTexCoord2f(GL_TEXTURE1, 

                          _bRightPeriodic ? -_iTexResS + 2 * _rTexelWidth : 

                                            -_rTexelWidth, 0); // offset amount

        glTexCoord2f(_rMaxS - _rTexelWidth, _rMinT); 

        glVertex3f(_rMaxX - 0.5f * _rPixelWidth, _rMinY, _rZ); 

        glTexCoord2f(_rMaxS - _rTexelWidth, _rMaxT); 

        glVertex3f(_rMaxX - 0.5f * _rPixelWidth, _rMaxY, _rZ);

      }

The latter change was a bug that never manifested on earlier GPUs. It manifests on G80 because the rasterization rules changed slightly. The boundaries were drawn on pixel edges, rather than pixel centers, so they “fell off the knife edge” on G80.

Please try out these changes and let me know if it works (and what performance you get!).

The CUDA fluid sample uses the FFT to transform in to frequency space to solve the poisson equation directly, rather than iteratively. The downside is it can’t handle arbitrary interior boundaries or non-periodic exterior boundary conditions.

Mark

Thanks Mark!

Unfortunately, this still did not seem to fix it, but it looks like it actually is a problem with the 64 bit driver, as netllama suggested. I bit the bullet and installed the card in a 32 bit machine, and the demo runs fine.

Thanks again,

Ted