Instability Problems with GTX 280

Hi all,

We have recently got the new GTX 280 devices. The performance is quite impressive ( almost a factor of 2 for all the programs) but most of the CUDA programs have strong instability problems.

We make extensive use of shared memory, as you can see from the sample kernel. The programs work stable for millions of iterations on 8600 GT, 8800 GTX, 9500m, 9800GX2 but we have strong problems when executing on the GTX 280.

In fact, after certain number of iterations (the numbers vary from call to call) the X-Server freezes and we have to reboot the computer.

We use the most recent 64 Bit Ubuntu System, the 177.13 driver and CUDA 2.0 (the same applies for older CUDA versions). Also, the same problem appears under Windows XP. We have also tested it with different GTX 280 boards and different main boards, still the same error.

What do we make wrong ?

global void solve_xxx_kernel(float* f_global, float* u_global, float* p1_global, float* p2_global, float theta, float lambda, int pitch)
int x = blockIdx.xblockDim.x + threadIdx.x;
int y = blockIdx.y
blockDim.y + threadIdx.y;

int c = y*pitch + x;

// Thread index
int tx = threadIdx.x+1;
int ty = threadIdx.y+1;

// Define arrays for shared memory
shared float p1_shared[BLOCK_SIZE+1][BLOCK_SIZE+1];
shared float p2_shared[BLOCK_SIZE+1][BLOCK_SIZE+1];

float f, u, divergence;

// load data into shared memory
f = f_global[c];
u = u_global[c];
p1_shared[ty][tx] = p1_global[c];
p2_shared[ty][tx] = p2_global[c];

if (x == 0)
p1_shared[ty][tx-1] = 0.0;
else if (tx == 1)
p1_shared[ty][tx-1] = p1_global[c-1];

if (y == 0)
p2_shared[ty-1][tx] = 0.0;
else if (ty == 1)
p2_shared[ty-1][tx] = p2_global[c-p];


// compute update
divergence = p1_shared[ty][tx]-p1_shared[ty][tx-1] +

u = (1-theta)u + theta(divergence/lambda + f);

// write back to global memory
u_global[c] = u;


the lines

if (y == 0)
p2_shared[ty-1][tx] = 0.0;
else if (ty == 1)
p2_shared[ty-1][tx] = p2_global[c-p];

should read

if (y == 0)
p2_shared[ty-1][tx] = 0.0;
else if (ty == 1)
p2_shared[ty-1][tx] = p2_global[c-pitch];

I have changed p to pitch for a better understanding

Please provide the following information:
0) Attach a complete test app which reproduces the problem

  1. Generate and attach an nvidia-bug-report.log (under Linux) while this problem is present
  2. Does this problem reproduce if X is not running?
  3. Have you verified that you’re using the latest motherboard BIOS?
  4. How many iterations are required to consistently reproduce this problem?

here is the attachement …



UUPS, It does not work …
Next try …
test.cpp (754 Bytes)

Next File …

please rename this file to “”
test2.cpp (2.18 KB)