Compiler produces invalid PTX Assembler (Bug?)

Hi!

I use OpenCL with success in some applications on my system (Ubuntu 10.04 32bit, GeForce 8800 GT with Nvidia Driver 260.19.12, CUDA-Toolkit 3.2.9, OpenCL 1.0).

But when I try to pass an 8-vector as argument to a kernel (which is allowed according to the specification), then the program does not compile. Here is a very simple code snippet, just try to load it, you even do not have to set up any program parameters at this stage.

__kernel void testkernel(__global int* output, int8 params) {

  output[0] = params.s5;

}

I get the following error output:

Program compilation notice. Returned status: 'CL_BUILD_ERROR',

ptxas application ptx input, line 20; error : Argument 1 of instuction 'add': must be register

ptxas fatal : Ptx assembly aborted due to errors

error : Ptx compilation failed: gpu='sm_11', device code='cuModuleLoadDataEx_17'

: Considering profile 'compute_11' for gpu='sm_11' in 'cuModuleLoadDataEx_17'

: Retrieving binary for 'cuModuleLoadDataEx_17', for gpu='sm_11', usage mode=' '

: Considering profile 'compute_11' for gpu='sm_11' in 'cuModuleLoadDataEx_17'

: Control flags for 'cuModuleLoadDataEx_17' disable search path

: Ptx binary found for 'cuModuleLoadDataEx_17', architecture='compute_11'

: Ptx compilation for 'cuModuleLoadDataEx_17', for gpu='sm_11', ocg options=' '

For some strange reason the compile succeeds when using [font=“Courier New”]params.s0[/font] … [font=“Courier New”]params.s4[/font] instead, but not for [font=“Courier New”]params.s5[/font] … [font=“Courier New”]params.s7[/font].

Same for the other vector types [font=“Courier New”]float8[/font], [font=“Courier New”]byte8[/font], etc.

It seems, that there is some very basic problem with accessing the last 3 components of any 8-vector! Can anybody confirm that?

[edit: reworked after the first answer (thanks to @jcpalmer) to make things more precise]

Second reply, did not fully read.

One thing you might do assign img_size.s6 to another var outside the loop & use that one in the loop. No reason, just a test:
uint test = img_size.s6;

if (o > 0 && o < test ) {

You could also see if this will even compiles:
kernel void foo(const uint8 x){
x.s6;
}

Could just change to a global.

Second reply, did not fully read.

One thing you might do assign img_size.s6 to another var outside the loop & use that one in the loop. No reason, just a test:
uint test = img_size.s6;

if (o > 0 && o < test ) {

You could also see if this will even compiles:
kernel void foo(const uint8 x){
x.s6;
}

Could just change to a global.

[deleted here & merged things into my initial post]

[deleted here & merged things into my initial post]

Somebody reported me that the same problem arises on some Windows 7 - machine (don’t now the details), which also successfully runs other OpenCL kernels. And this code seems to work without problems under Mac OS.

Is it forbidden to use 8-vectors as arguments ‘by value’ to kernel functions? I cannot find something negative on this in the OpenCL-Specification.

I’ll be glad if someone could help me on this, or confirm if the above miniprogram loads or not on your system.

Somebody reported me that the same problem arises on some Windows 7 - machine (don’t now the details), which also successfully runs other OpenCL kernels. And this code seems to work without problems under Mac OS.

Is it forbidden to use 8-vectors as arguments ‘by value’ to kernel functions? I cannot find something negative on this in the OpenCL-Specification.

I’ll be glad if someone could help me on this, or confirm if the above miniprogram loads or not on your system.