OpenCL compiler bug - incorrect PTX output for ternary operator

ihaque · July 22, 2010, 7:11am

I have a simple kernel that fails to build under NVIDIA’s OpenCL driver:

[codebox]

__kernel void deviceWritePairedConstants(__global uint* base,uint N,uint pattern0,uint pattern1) {

const uint pattern = ((get_local_id(0)) & 0x1) ? pattern1 : pattern0;

for (uint i = 0 ; i < N; i++) {

    *((base + get_group_id(0) * N * get_local_size(0) + i * get_local_size(0) + get_local_id(0))) = pattern;

}

}

[/codebox]

This kernel fails clBuildProgram with the following errors; the failure occurs when compiling for either a GTX 480 or a GTX 260, though the sm_XX flags obviously change:

[codebox]

ptxas application ptx input, line 38; error : Argument 1 of instuction ‘selp’: must be register

ptxas application ptx input, line 38; error : Argument 2 of instuction ‘selp’: must be register

ptxas fatal : Ptx assembly aborted due to errors

error : Ptx compilation failed: gpu=‘sm_20’, device code=‘cuModuleLoadDataEx_5’

: Considering profile ‘compute_20’ for gpu=‘sm_20’ in ‘cuModuleLoadDataEx_5’

: Retrieving binary for ‘cuModuleLoadDataEx_5’, for gpu=‘sm_20’, usage mode=’ ’

: Considering profile ‘compute_20’ for gpu=‘sm_20’ in ‘cuModuleLoadDataEx_5’

: Control flags for ‘cuModuleLoadDataEx_5’ disable search path

: Ptx binary found for ‘cuModuleLoadDataEx_5’, architecture=‘compute_20’

: Ptx compilation for ‘cuModuleLoadDataEx_5’, for gpu=‘sm_20’, ocg options=’ ’

[/codebox]

The (total hack of a) workaround is to replace the ternary operator with the following, which runs fine:

[codebox]

uint isodd = get_local_id(0) & 0x1;

isodd *= 0xFFFFFFFF;

const uint pattern = (isodd & pattern1) | ((~isodd) & pattern0);

[/codebox]

Driver details:

[codebox]

OpenCL SW Info:

CL_PLATFORM_NAME: NVIDIA CUDA

CL_PLATFORM_VERSION: OpenCL 1.0 CUDA 3.1.1

OpenCL SDK Revision: 5537818

…

Device GeForce GTX 480

CL_DEVICE_NAME: GeForce GTX 480

CL_DEVICE_VENDOR: NVIDIA Corporation

CL_DRIVER_VERSION: 256.35

[/codebox]

The original version builds and runs fine under AMD’s Stream SDK on a Radeon 5870.