Instability Problems with GTX 280

IndianaTom · July 9, 2008, 11:42am

Hi all,

We have recently got the new GTX 280 devices. The performance is quite impressive ( almost a factor of 2 for all the programs) but most of the CUDA programs have strong instability problems.

We make extensive use of shared memory, as you can see from the sample kernel. The programs work stable for millions of iterations on 8600 GT, 8800 GTX, 9500m, 9800GX2 but we have strong problems when executing on the GTX 280.

In fact, after certain number of iterations (the numbers vary from call to call) the X-Server freezes and we have to reboot the computer.

We use the most recent 64 Bit Ubuntu System, the 177.13 driver and CUDA 2.0 (the same applies for older CUDA versions). Also, the same problem appears under Windows XP. We have also tested it with different GTX 280 boards and different main boards, still the same error.

What do we make wrong ?

global void solve_xxx_kernel(float* f_global, float* u_global, float* p1_global, float* p2_global, float theta, float lambda, int pitch)
{
int x = blockIdx.xblockDim.x + threadIdx.x;
int y = blockIdx.yblockDim.y + threadIdx.y;

int c = y*pitch + x;

// Thread index
int tx = threadIdx.x+1;
int ty = threadIdx.y+1;

// Define arrays for shared memory
shared float p1_shared[BLOCK_SIZE+1][BLOCK_SIZE+1];
shared float p2_shared[BLOCK_SIZE+1][BLOCK_SIZE+1];

float f, u, divergence;

// load data into shared memory
f = f_global[c];
u = u_global[c];
p1_shared[ty][tx] = p1_global[c];
p2_shared[ty][tx] = p2_global[c];

if (x == 0)
p1_shared[ty][tx-1] = 0.0;
else if (tx == 1)
p1_shared[ty][tx-1] = p1_global[c-1];

if (y == 0)
p2_shared[ty-1][tx] = 0.0;
else if (ty == 1)
p2_shared[ty-1][tx] = p2_global[c-p];

__syncthreads();

// compute update
divergence = p1_shared[ty][tx]-p1_shared[ty][tx-1] +
p2_shared[ty][tx]-p2_shared[ty-1][tx];

u = (1-theta)u + theta(divergence/lambda + f);

// write back to global memory
u_global[c] = u;
}

IndianaTom · July 9, 2008, 11:45am

Sorry,

the lines

if (y == 0)
p2_shared[ty-1][tx] = 0.0;
else if (ty == 1)
p2_shared[ty-1][tx] = p2_global[c-p];

should read

if (y == 0)
p2_shared[ty-1][tx] = 0.0;
else if (ty == 1)
p2_shared[ty-1][tx] = p2_global[c-pitch];

I have changed p to pitch for a better understanding

netllama · July 9, 2008, 1:56pm

Please provide the following information:
0) Attach a complete test app which reproduces the problem

Generate and attach an nvidia-bug-report.log (under Linux) while this problem is present
Does this problem reproduce if X is not running?
Have you verified that you’re using the latest motherboard BIOS?
How many iterations are required to consistently reproduce this problem?

IndianaTom · July 9, 2008, 4:02pm

Please provide the following information:

Attach a complete test app which reproduces the problem

find attached a test app.

./test [numer_of_iterations]

Generate and attach an nvidia-bug-report.log (under Linux) while this problem is present

I was not able to do that because the application did not return and the script to generate the bug report does not work, whene the GPU is busy. I also cannot kill the application.

Should I restart the computer before generating the bug report ?

Does this problem reproduce if X is not running?

The same problem with and without X

Have you verified that you’re using the latest motherboard BIOS?

Not yet, but it is a very recent one …

How many iterations are required to consistently reproduce this problem?

sometimes more than 10^6, sometimes 10^3, not consistently …

[snapback]407895[/snapback]

IndianaTom · July 9, 2008, 4:04pm

here is the attachement …

Thanks

Tom

IndianaTom · July 9, 2008, 4:08pm

UUPS, It does not work …
Next try …
test.cpp (754 Bytes)

IndianaTom · July 9, 2008, 4:08pm

Next File …

IndianaTom · July 9, 2008, 4:10pm

please rename this file to “test2.cu”
test2.cpp (2.18 KB)

Topic		Replies	Views
memory errors on GTX 280 CUDA Programming and Performance	5	2761	May 28, 2009
GTX480 CUDA Incompatability CUDA Programming and Performance	1	1085	February 3, 2011
Using the GPU just for computing CUDA Programming and Performance	17	10451	September 15, 2008
Issues with GTX280 and Mandelbrot CUDA Programming and Performance	31	18839	September 22, 2008
GPU diagnostics How to test a GPU CUDA Programming and Performance	9	133687	May 24, 2017
Performance question CUDA Programming and Performance	4	1643	November 28, 2008
Newbie XP & hardware questions CUDA Programming and Performance	8	10716	June 14, 2008
Multiple GTX280 not recognized CUDA Programming and Performance	3	1852	January 25, 2009
Stability Problem CUDA Programming and Performance	12	3970	February 4, 2011
GPU in state where results are not reproducible! CUDA Programming and Performance	50	16699	November 2, 2012

Instability Problems with GTX 280

Related topics