Error: Code selection failed to select: i1 = add nVidia error

I have a problem that only occurs on nVidia platforms. I have managed to trace the problem to a certain piece of my code, i.e. the calling of two functions. The code works under all circumstances on my ATI card, and fails at the same time/place on two nVidia cards. The error arises during the build of the kernel. The strange thing is, that when I greatly reduce the amount of calculations in the functions (less terms in the return statement - see below for code), it might work some times. If i reduce the calculations slightly but reduce other calculations outside the two functions, it might work too.

Worthy to note is that the code worked when the vectors that the functions operate on are private memory. The error started to occur when I implemented work groups and made the vectors that it operates on to local (this still works on ATI).

I tried updating drivers. I use PyOpenCL and have tried different versions of it. The error still occurs.

Error message:

Traceback (most recent call last):

  File "application.py", line 79, in <module>

    [energy,standardDeviation,acceptanceRate,runTime, binCounts] = getOperatorMean(systemClass, RP, systemOp)

  File "src/host.py", line 106, in getOperatorMean

    KE = loadKernel(system, BP, operator)

  File "src/host.py", line 303, in loadKernel

    prg.build(options=buildOptions)

  File "/usr/local/lib/python2.6/dist-packages/pyopencl-2011.1.1-py2.6-linux-x86_64.egg/pyopencl/__init__.py", line 447, in build

    cache_dir=cache_dir)

  File "/usr/local/lib/python2.6/dist-packages/pyopencl-2011.1.1-py2.6-linux-x86_64.egg/pyopencl/cache.py", line 421, in create_built_program_from_source_cached

    ctx, src, options, devices, cache_dir)

  File "/usr/local/lib/python2.6/dist-packages/pyopencl-2011.1.1-py2.6-linux-x86_64.egg/pyopencl/cache.py", line 346, in _create_built_program_from_source_cached

    prg.build(options, [devices[i] for i in to_be_built_indices])

  File "/usr/local/lib/python2.6/dist-packages/pyopencl-2011.1.1-py2.6-linux-x86_64.egg/pyopencl/__init__.py", line 164, in program_build

    raise err

pyopencl.RuntimeError: clBuildProgram failed: build program failure - 

Build on <pyopencl.Device 'GeForce GTX 470' at 0x251cb70>:

Error: Code selection failed to select: 0x2ae6010: i1 = add 0x2b73120, 0x2b6e9e0

Code for the functions:

// V - float scalar

// sigma - float scalar

// x - float value from local memory

// y - float value from local memory

inline float operatorFunction(float V, float sigma, float x, float y)

{

    float sqrt3 = 1.73205081f;

    float invSqrtPi = 0.5641895835f;

    float D = 2.0f * sigma * sigma;

    float D_inv = 1.0f / D;

float A = 4.0f * x * x;

    float B = (x - sqrt3 * y) * (x - sqrt3 * y);

    float C = (x + sqrt3 * y) * (x + sqrt3 * y);

return -V * D_inv * invSqrtPi * ((A - D) * native_exp(-A * D_inv) +

                                     (B - D) * native_exp(-B * D_inv) +

                                     (C - D) * native_exp(-C * D_inv)) / sigma;

}

inline float potentialFunction(float V, float sigma, float x, float y)

{

    float sqrt3 = 1.73205081f;

    float invSqrtPi = 0.5641895835f;

    float D_inv = 0.5f/(sigma*sigma);

float A = 4.0f*x*x;

    float B = (x-sqrt3*y)*(x-sqrt3*y);

    float C = (x+sqrt3*y)*(x+sqrt3*y);

return V * invSqrtPi * (native_exp(-A * D_inv) +

                            native_exp(-B * D_inv) +

                            native_exp(-C * D_inv)) / sigma;

}

Everywhere else I ask, everyone seems to think that this is clearly an issue caused by the compiler optimizing in a way that it can not run the optimized code generated. Any thoughts at all? Can I provide any more useful information about my situation?

Not much of any support or activity in here…
Anyone know where should I turn for support for nVidia-related issues?