ptxas internal error branching on register - internal error?

On my Core-i7 PC (running Win7) with CUDA 3.1, the below simplified version of my ptx code apparently causes an internal error with ptxas. Am I doing something wrong, or is this an actual bug?

[codebox]

.version 2.1

.target sm_20

.global .u64 foo;

.entry foobar

{

.reg .u64 a;

.reg .u32 t, j0, j1;

mov.u32		j0,	$bar;

mov.u32		j1,	$bar2;

$bar:

ld.global.u64	a,	[foo];

$bar2:

max.u32		t,	j0,	j1;

bra.uni		t;

exit;

}

[/codebox]

:D

On my Core-i7 PC (running Win7) with CUDA 3.1, the below simplified version of my ptx code apparently causes an internal error with ptxas. Am I doing something wrong, or is this an actual bug?

[codebox]

.version 2.1

.target sm_20

.global .u64 foo;

.entry foobar

{

.reg .u64 a;

.reg .u32 t, j0, j1;

mov.u32		j0,	$bar;

mov.u32		j1,	$bar2;

$bar:

ld.global.u64	a,	[foo];

$bar2:

max.u32		t,	j0,	j1;

bra.uni		t;

exit;

}

[/codebox]

:D

Whatever the problem was seems to have been fixed in the 3.2 release candidate. On a WinXP64 machine, with the CUDA 3.1 toolchain, I get:

[…]>ptxas --gpu-name=sm_20 --output-file foo.o foo.ptx

Internal error

On the same machine, with the CUDA 3.2 toolchain, I get no errors, and an object file is produced:

[…]>ptxas --gpu-name=sm_20 --output-file foo.o foo.ptx

[…]>dir foo*

Volume in drive C has no label.

Volume Serial Number is CCF2-BBAF

Directory of […]

10/11/2010 02:33 PM 1,592 foo.o

10/11/2010 02:30 PM 260 foo.ptx

           2 File(s)          1,852 bytes

Are you able to proceed with CUDA 3.2 ?

Whatever the problem was seems to have been fixed in the 3.2 release candidate. On a WinXP64 machine, with the CUDA 3.1 toolchain, I get:

[…]>ptxas --gpu-name=sm_20 --output-file foo.o foo.ptx

Internal error

On the same machine, with the CUDA 3.2 toolchain, I get no errors, and an object file is produced:

[…]>ptxas --gpu-name=sm_20 --output-file foo.o foo.ptx

[…]>dir foo*

Volume in drive C has no label.

Volume Serial Number is CCF2-BBAF

Directory of […]

10/11/2010 02:33 PM 1,592 foo.o

10/11/2010 02:30 PM 260 foo.ptx

           2 File(s)          1,852 bytes

Are you able to proceed with CUDA 3.2 ?

Switched to 3.2 – alas, my actual program unfortunately still dies… :( The original source is some 900 lines of ptx code; here’s a similar test case that still causes ptxas to complain. Obviously the program below doesn’t actually make sense - adding the addresses of two labels and masking away the high 16 bits, then jumping to the result isn’t the smartest thing to do if you want a stable machine. :) But the essence of what I’m trying to do is there. Well, apparently not from ptxas’s point of view… :(

[codebox]

.version 2.1

.target sm_20

.global .u64 foo;

.entry foobar

{

.reg .u64 a;

.reg .u32 t0, t1, j0, j1;

mov.u32		j0,		$bar0;

mov.u32		j1,		$bar1;

ld.global.u64	a,		[foo];

mov.u32		t0,		%tid;

$bar0:

add.u32		t0,		t0,		1;

bfi.b32		j0,		t0,		j0,		26,		6;

add.u32		t1,		j0,		j1;

and.b32		t1,		t1,		0x0000ffff;

bra.uni		t1;

$bar1:

st.u32		[a],		t0;

exit;

}

[/codebox]

:D

Switched to 3.2 – alas, my actual program unfortunately still dies… :( The original source is some 900 lines of ptx code; here’s a similar test case that still causes ptxas to complain. Obviously the program below doesn’t actually make sense - adding the addresses of two labels and masking away the high 16 bits, then jumping to the result isn’t the smartest thing to do if you want a stable machine. :) But the essence of what I’m trying to do is there. Well, apparently not from ptxas’s point of view… :(

[codebox]

.version 2.1

.target sm_20

.global .u64 foo;

.entry foobar

{

.reg .u64 a;

.reg .u32 t0, t1, j0, j1;

mov.u32		j0,		$bar0;

mov.u32		j1,		$bar1;

ld.global.u64	a,		[foo];

mov.u32		t0,		%tid;

$bar0:

add.u32		t0,		t0,		1;

bfi.b32		j0,		t0,		j0,		26,		6;

add.u32		t1,		j0,		j1;

and.b32		t1,		t1,		0x0000ffff;

bra.uni		t1;

$bar1:

st.u32		[a],		t0;

exit;

}

[/codebox]

:D

I assume that you are a registered developer. If so, it would be helpful if you could file a bug so our toolchain team can follow up on this. If you are unable to file a bug, please let me know (you can also send me a personal message through the forums). Thanks.

I assume that you are a registered developer. If so, it would be helpful if you could file a bug so our toolchain team can follow up on this. If you are unable to file a bug, please let me know (you can also send me a personal message through the forums). Thanks.