Switch oddities Compiler bug?

Tigga · August 13, 2008, 3:02pm

I was fiddling with one of my kernels, and I found some odd behaviour which seems to be causing switch statements to evaluate incorrectly.

This works:

unsigned int a = 1;

for (...) {

	<do stuff based a (value doesn't change)>

	if (a < 3) {

  a++;

  a %= 3;

	}

	else if (a == 3) {

  a = 1337;

	}

}

This also works:

unsigned int a = 1;

for (...) {

	<do stuff based a (value doesn't change)>

	switch(a) {

  case 0: a = 1; break;

  case 1: a = 2; break;

  case 2: a = 0; break;	

	}

}

This doesn’t:

unsigned int a = 1;

for (...) {

	<do stuff based a (value doesn't change)>

	switch(a) {

  case 0: a = 1; break;

  case 1: a = 2; break;

  case 2: a = 0; break;	

  case 3: a = 1337; break;	

	}

}

The first and third codeblocks are identical logically. The value of a should never be 3, however adding a case for if it is 3 in the switch statement has an effect, while adding a check in an if-else statement doesn’t.

Tigga · August 13, 2008, 5:24pm

I’ve investigated slightly further. If I store the value of ‘a’ into a global variable before the switch statement, everything works. This implies to me that some sort of nasty optimization is going on somewhere, messing about with ‘a’ before it gets to the switch statement.

This works:

unsigned int a = 1;

for (...) {

<do stuff based a (value doesn't change)>

some_global_memory[0] = a; // This line is vital!

switch(a) {

 case 0: a = 1; break;

 case 1: a = 2; break;

 case 2: a = 0; break;

 case 3: a = 1337; break;

}

}

I’ve tried to produce a small test file showing this, but have thusfar failed to reproduce it. Might try again tomorrow.

BTW:

XP Pro

CUDA 2.0

177.41

Gefore GTX 260

cbuchner1 · August 13, 2008, 6:11pm

for good measure, make the variable volatile. This should have the same effect as storing it into a global.

Tigga · August 14, 2008, 8:51am

EDIT: The volatile keyword has no effect. It seems to require a global store.

Tigga · August 14, 2008, 9:47am

Okies - I have a simplish test program that reproduces the error. All the functions listed should give the same results, however test2_GPU only gives the correct results when run in emulation mode.

Here is my output when run on the GPU:

Size 3: GPU

0.000000

10.000000

10.000000

10.000000

20.000000

-----------

Size 4: GPU

0.000000

10.000000

20.000000

30.000000

30.000000

-----------

Size 3: CPU

0.000000

10.000000

10.000000

10.000000

20.000000

-----------

Size 4: CPU

0.000000

10.000000

10.000000

10.000000

20.000000

I think I’d still be quite suprised if this is a compiler error, however I just can’t see any logic errors, and the emulation mode works fine. The GPU and CPU code and CPU code is identical and it only runs 1 thread in this test case.

EDIT: Attachment didn’t work

EDIT 2: See two posts down for attachment.

Reimar · August 14, 2008, 10:22am

Well, at least your CPU code fails to initialize x[0] AFAICT

Tigga · August 14, 2008, 10:44am

Oops :">. Must have uploaded an old version. Sorry!

Fixed it - gives the same results.
test.txt (5.1 KB)

Tigga · August 15, 2008, 9:10am

Anybody else had a chance to look at this? I’m still at a loss to explain why it isn’t working.

Sarnath · August 18, 2008, 8:06am

What is the effect that you are seeing?

Is it an effect on performance OR program correctness?

Geli · August 18, 2008, 8:33am

I think it’s a bad idea to use doubles on the GPU, i changed everything to floats in your example and voila:

Size 3: GPU

0.000000

10.000000

10.000000

10.000000

20.000000

-----------

Size 4: GPU

0.000000

10.000000

10.000000

10.000000

20.000000

-----------

Size 3: CPU

0.000000

10.000000

10.000000

10.000000

20.000000

-----------

Size 4: CPU

0.000000

10.000000

10.000000

10.000000

20.000000

Tigga · August 18, 2008, 8:57am

Program correctness. All of the runs should be calculating the same thing, however the second run (Size 4: GPU) produces different results.

Tigga · August 18, 2008, 8:59am

My program needs to run in double precision and double precision is supported on my card, though it’s interesting that it works find with single (or maybe just on your card/setup).

Geli · August 18, 2008, 9:17am

hmm the -arch sm_13 does make a difference with nvcc 2, but well i guess it’s just really beta and buggy anyways… what does your nvcc -V say?

mine seems a bit wired ( i have 1.1 and 2.0 installed) :

here’s the 2.0:

stephaga@biwidl02:~/cuda2/cuda/bin $ ./nvcc -V && which ./nvcc

nvcc: NVIDIA Â® Cuda compiler driver

Built on Tue_Jun_10_05:42:45_PDT_2008

Cuda compilation tools, release 1.1, V0.2.1221

./nvcc

→ Cuda compilation tools, release 1.1, V0.2.1221 (should’t this be release 2.0 ?)

here’s the 1.1

stephaga@biwidl02:~/cuda2/cuda/bin $ nvcc -V && which nvcc

nvcc: NVIDIA Â® Cuda compiler driver

Built on Thu_Nov_29_19:14:37_PST_2007

Cuda compilation tools, release 1.1, V0.2.1221

/usr/sepp/bin/nvcc

stephaga@biwidl02:~/cuda2/cuda/bin $

this seems ok, also if i use this compiler, then the -arch sm_13 option fails

stephaga@biwidl02:~/cuda2 $ ~/cuda2/cuda/bin/nvcc ../cudabug2double.cu -I ~/cuda2/sdk/common/inc/ -o cudabug2double

stephaga@biwidl02:~/cuda2 $ ./cudabug2double

Size 3: GPU

524288.000000

524288.127197

0.000000

0.000000

0.000000

-----------

Size 4: GPU

524288.000000

524288.127197

0.000000

0.000000

0.000000

-----------

Size 3: CPU

0.000000

10.000000

10.000000

10.000000

20.000000

-----------

Size 4: CPU

0.000000

10.000000

10.000000

10.000000

20.000000

stephaga@biwidl02:~/cuda2 $ ~/cuda2/cuda/bin/nvcc -arch sm_13 ../cudabug2double.cu -I ~/cuda2/sdk/common/inc/ -o cudabug2double

stephaga@biwidl02:~/cuda2 $ ./cudabug2double

Size 3: GPU

0.000000

10.000000

10.000000

10.000000

20.000000

-----------

Size 4: GPU

0.000000

10.000000

20.000000

30.000000

39.992187

-----------

Size 3: CPU

0.000000

10.000000

10.000000

10.000000

20.000000

-----------

Size 4: CPU

0.000000

10.000000

10.000000

10.000000

20.000000

S.Warris · August 19, 2008, 2:48am

You initialize only the first element:

double xRegCache[3];	

xRegCache[0] = x[0];

It is difficult to see with all the for-s and switches whether or not you are using uninitialized memory. On the CPU values will default to zero, on the GPU they are undefined. Is this the problem?

Edit: typo

Tigga · August 19, 2008, 8:35am

The rest of the elements are initialised within the loop.

Interesting. If I manually initialize the variables I do indeed get different results for the GPU with 4, however I still get the same results for all the other tests. This means that the GPU version of the algoritm is accessing uninitialized variables, while an identical CPU version isn’t. Given that the GPU is just running one thread, this seems wrong.

Tigga · August 19, 2008, 8:37am

nvcc: NVIDIA ® Cuda compiler driver

Built on Thu_Jun_12_01:14:00_PDT_2008

Cuda compilation tools, release 1.1, V0.2.1221

The timestamp is slightly different, but the release number is the same.

Tigga · September 10, 2008, 9:18am

Just got back around to this part of the program. This error seems to be still here with CUDA 2.0.

Topic		Replies	Views
problem with double precision unpredictable results Different run give differents errors or no error CUDA Programming and Performance	12	2792	September 10, 2010
This is driving me nuts! memory access problem.. CUDA Programming and Performance	5	2662	December 7, 2007
Incosistent results - can't explain CUDA Programming and Performance	18	3062	May 10, 2010
first install of cuda CUDA Setup and Installation	6	7630	February 12, 2017
CUDA compile trouble CUDA Programming and Performance	47	5111	November 8, 2010
Compilation broken sign-change-detection code CUDA Programming and Performance	4	4895	January 28, 2011
Odd error fixed by commenting unrelated line? CUDA Programming and Performance	11	8619	February 17, 2010
1080 does not support doubles? CUDA Programming and Performance	5	552	October 30, 2018
CUDA 1.1 Bug - Compiler crash (ptxas) w/repro CUDA Programming and Performance	16	8600	May 19, 2008
Program gives unexpected error compiles smooth, but output is unexpected result CUDA Programming and Performance	5	3294	October 17, 2011

Switch oddities Compiler bug?

Related topics