OpenCL build error

Hej folks,

I have written an OpenCL code and it does not compile and I can’t figure out why.

Here is my code:

void __kernel square(

    __global int* input,

    __global int* output,

        const unsigned int count)


    int i = get_global_id(0);

output[i] = (int)(log((float)(input[i])+1)*45.986);



void __kernel square2d(

    __global image2d_t input,

    __global image2d_t output)


    int2 pos = (int2)(get_global_id(0), get_global_id(1));

    float4 pixel = (float4)(0, 0, 0, 0);

    pixel =  read_imagef(input, samp, pos);

    //pixel =  read_imagei(input, samp, pos);

    pixel = (pixel+1) * 45.986;

write_imagef(output, pos, pixel);


The failure is following:

QCLProgram::build: "CL_INVALID_BINARY" 

"ptxas application ptx input, line 113; fatal   : Parsing error near ',': syntax error

ptxas fatal   : Ptx assembly aborted due to errors

error   : Ptx compilation failed: gpu='sm_11', device code='cuModuleLoadDataEx_4'

: Considering profile 'compute_11' for gpu='sm_11' in 'cuModuleLoadDataEx_4'

: Retrieving binary for 'cuModuleLoadDataEx_4', for gpu='sm_11', usage mode='  '

: Considering profile 'compute_11' for gpu='sm_11' in 'cuModuleLoadDataEx_4'

: Control flags for 'cuModuleLoadDataEx_4' disable search path

: Ptx binary found for 'cuModuleLoadDataEx_4', architecture='compute_11'

: Ptx compilation for 'cuModuleLoadDataEx_4', for gpu='sm_11', ocg options='  '

May be somebody can help me.



You can get the generated PTX assembler source code by passing CL_PROGRAM_BINARIES to clGetProgramInfo. Doing so reveals the following code being generated (the comma being the one from the error message):

.entry square2d(



I.e., there’s no code being generated for the square2d kernel. You can fix this by rewriting the image arguments as

__read_only image2d_t input,

    __write_only image2d_t output

Images are __global and __read_only by default, which is why in your code “output” cannot be written to, which is why no code is generated. However, the NVIDIA compiler certainly should behave more nicely and report an error that a __read_only image is being written to.

Thanks for your help. I changed it and it worked. Even…

Now the Program crashes on waiting for the OpenCL program.

Is there an other issue in my code?



I have had the same issues with the broken PTX and the compiler crashing. See the thread here

NVIDIA’s OpenCL image support is seriously broken currently. You will find a lot of bugs.

Two things I have done to get around these issues:
Use normal buffer and write your own bilinear sampler.
Use AMD’s GPU or CPU OpenCL platform for reference.

One more thing: change your sampler definition from const sampler_t to __constant sampler_t (trust me)

Have you created a bug report for the broken PTX when omitting __read_only with an image2d_t?

What exactly do you mean by “waiting for the OpenCL program”? My guess is that you probably have another bug in your host code rather than in the kernel code.

I believe you should also write “__kernel void square2d()” instead of “void __kernel square2d()”.

I just filed a report (bug ID 774964).

I changed it to __kernel void square - same problem…

I use the QtOpenCL lib.

here is my code:

QString kernelFilename = ":/";

    QString imageFilename = "16BitTST_2041x3001.png"; // b/w

// load image

    QImage image = QImage(imageFilename);

    if (image.isNull())


        cout<<"Cannot load "<<imageFilename.toStdString() << "." << endl;

        return -1;//EXIT_FAILURE;


    QImage resultImg;

    cout<<"image.bytesPerLine(): "<<image.bytesPerLine()<<endl;

    cout<<"image.bytesPerPixel(): "<<image.bytesPerLine()/image.width()<<endl;

cout<<"image.width(): "<<image.width()<<endl;

    cout<<"image.height(): "<<image.height()<<endl;

size_t global;                      // global domain size for our calculation

    size_t local;                       // local domain size for our calculation


    QCLContext context;

    if (!context.create(QCLDevice::GPU))//QCLDevice::CPU))


        fprintf(stderr, "Could not create OpenCL context for the GPU\n");

        return -1;


QCLImage2D imageCL = context.createImage2DDevice(QImage::Format_ARGB32,image.size(),QCLMemoryObject::ReadOnly);

    const QRect rect;

    if( !imageCL.write(image, rect) )


        qDebug() << "Image could not been read.";

        return -1;


    QCLImage2D OutImage = context.createImage2DDevice(QImage::Format_ARGB32,image.size(),QCLMemoryObject::ReadWrite);

QCLProgram program = context.buildProgramFromSourceFile(kernelFilename);

    QCLKernel kernel = program.createKernel("square2d");

    //----end init----------------------------------------------------------------

// Execute the kernel over the entire range of our 1d input data set

    // using the maximum number of work group items for this device


    kernel.setRoundedGlobalWorkSize(image.size()); // to get a multible of local work size

cout << "Starting the Program." << endl;


cout << "Waiting until program finshes." << endl;

    // Wait for the command commands to get serviced before reading back results


// Read back the results from the device to verify the output

    resultImg = OutImage.toQImage();

// save image

    if(QFile::exists("mod_" + imageFilename))

        QFile::remove("mod_" + imageFilename);"mod_" + imageFilename, "PNG");

The program crashes after the “Waiting until program finshes.” output.

EDIT: The program crashes on “resultImg = OutImage.toQImage();”!

2nd EDIT: The program exits somewhere between destructing the OpenCL stuff and “return 0;”. And the output image is blank white.

3rd EDIT: The result image is what I assumed as result (was an image format problem)