You can get the generated PTX assembler source code by passing CL_PROGRAM_BINARIES to clGetProgramInfo. Doing so reveals the following code being generated (the comma being the one from the error message):
.entry square2d(
,
)
I.e., there’s no code being generated for the square2d kernel. You can fix this by rewriting the image arguments as
Images are __global and __read_only by default, which is why in your code “output” cannot be written to, which is why no code is generated. However, the NVIDIA compiler certainly should behave more nicely and report an error that a __read_only image is being written to.
I have had the same issues with the broken PTX and the compiler crashing. See the thread here
NVIDIA’s OpenCL image support is seriously broken currently. You will find a lot of bugs.
Two things I have done to get around these issues:
Use normal buffer and write your own bilinear sampler.
Use AMD’s GPU or CPU OpenCL platform for reference.
One more thing: change your sampler definition from const sampler_t to __constant sampler_t (trust me)
eyebex:
Have you created a bug report for the broken PTX when omitting __read_only with an image2d_t?
What exactly do you mean by “waiting for the OpenCL program”? My guess is that you probably have another bug in your host code rather than in the kernel code.
I believe you should also write “__kernel void square2d()” instead of “void __kernel square2d()”.
I changed it to __kernel void square - same problem…
I use the QtOpenCL lib.
here is my code:
QString kernelFilename = ":/kernel_source.cl";
QString imageFilename = "16BitTST_2041x3001.png"; // b/w
// load image
QImage image = QImage(imageFilename);
if (image.isNull())
{
cout<<"Cannot load "<<imageFilename.toStdString() << "." << endl;
return -1;//EXIT_FAILURE;
}
QImage resultImg;
cout<<"image.bytesPerLine(): "<<image.bytesPerLine()<<endl;
cout<<"image.bytesPerPixel(): "<<image.bytesPerLine()/image.width()<<endl;
cout<<"image.width(): "<<image.width()<<endl;
cout<<"image.height(): "<<image.height()<<endl;
size_t global; // global domain size for our calculation
size_t local; // local domain size for our calculation
//----init--------------------------------------------------------------------
QCLContext context;
if (!context.create(QCLDevice::GPU))//QCLDevice::CPU))
{
fprintf(stderr, "Could not create OpenCL context for the GPU\n");
return -1;
}
QCLImage2D imageCL = context.createImage2DDevice(QImage::Format_ARGB32,image.size(),QCLMemoryObject::ReadOnly);
const QRect rect;
if( !imageCL.write(image, rect) )
{
qDebug() << "Image could not been read.";
return -1;
}
QCLImage2D OutImage = context.createImage2DDevice(QImage::Format_ARGB32,image.size(),QCLMemoryObject::ReadWrite);
QCLProgram program = context.buildProgramFromSourceFile(kernelFilename);
QCLKernel kernel = program.createKernel("square2d");
//----end init----------------------------------------------------------------
// Execute the kernel over the entire range of our 1d input data set
// using the maximum number of work group items for this device
kernel.setLocalWorkSize(kernel.bestLocalWorkSizeImage2D());
kernel.setRoundedGlobalWorkSize(image.size()); // to get a multible of local work size
cout << "Starting the Program." << endl;
kernel(imageCL,OutImage);
cout << "Waiting until program finshes." << endl;
// Wait for the command commands to get serviced before reading back results
context.finish();
// Read back the results from the device to verify the output
resultImg = OutImage.toQImage();
// save image
if(QFile::exists("mod_" + imageFilename))
QFile::remove("mod_" + imageFilename);
resultImg.save("mod_" + imageFilename, "PNG");
The program crashes after the “Waiting until program finshes.” output.
EDIT: The program crashes on “resultImg = OutImage.toQImage();”!
2nd EDIT: The program exits somewhere between destructing the OpenCL stuff and “return 0;”. And the output image is blank white.
3rd EDIT: The result image is what I assumed as result (was an image format problem)