I have a simple kernel that fails to build under NVIDIA’s OpenCL driver:
[codebox]
__kernel void deviceWritePairedConstants(__global uint* base,uint N,uint pattern0,uint pattern1) {
const uint pattern = ((get_local_id(0)) & 0x1) ? pattern1 : pattern0;
for (uint i = 0 ; i < N; i++) {
*((base + get_group_id(0) * N * get_local_size(0) + i * get_local_size(0) + get_local_id(0))) = pattern;
}
}
[/codebox]
This kernel fails clBuildProgram with the following errors; the failure occurs when compiling for either a GTX 480 or a GTX 260, though the sm_XX flags obviously change:
[codebox]
ptxas application ptx input, line 38; error : Argument 1 of instuction ‘selp’: must be register
ptxas application ptx input, line 38; error : Argument 2 of instuction ‘selp’: must be register
ptxas fatal : Ptx assembly aborted due to errors
error : Ptx compilation failed: gpu=‘sm_20’, device code=‘cuModuleLoadDataEx_5’
: Considering profile ‘compute_20’ for gpu=‘sm_20’ in ‘cuModuleLoadDataEx_5’
: Retrieving binary for ‘cuModuleLoadDataEx_5’, for gpu=‘sm_20’, usage mode=’ ’
: Considering profile ‘compute_20’ for gpu=‘sm_20’ in ‘cuModuleLoadDataEx_5’
: Control flags for ‘cuModuleLoadDataEx_5’ disable search path
: Ptx binary found for ‘cuModuleLoadDataEx_5’, architecture=‘compute_20’
: Ptx compilation for ‘cuModuleLoadDataEx_5’, for gpu=‘sm_20’, ocg options=’ ’
[/codebox]
The (total hack of a) workaround is to replace the ternary operator with the following, which runs fine:
[codebox]
uint isodd = get_local_id(0) & 0x1;
isodd *= 0xFFFFFFFF;
const uint pattern = (isodd & pattern1) | ((~isodd) & pattern0);
[/codebox]
Driver details:
[codebox]
OpenCL SW Info:
CL_PLATFORM_NAME: NVIDIA CUDA
CL_PLATFORM_VERSION: OpenCL 1.0 CUDA 3.1.1
OpenCL SDK Revision: 5537818
…
Device GeForce GTX 480
CL_DEVICE_NAME: GeForce GTX 480
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 256.35
[/codebox]
The original version builds and runs fine under AMD’s Stream SDK on a Radeon 5870.