I’m having a vexing problem reading from __constant memory. I wrote a 3x3 median filter that works on tiled images, with the actual x/y coordinates of the upper left corner of the tiles passed via a __constant memory buffer (the idea is to be able to reduce processing effort if only a part of the image is interesting). This code will work for one run, then fail for the next (using identical inputs). After much experimentation I figured out the problem is that the OpenCL kernel will read the wrong coordinates out of the __constant buffer. I make sure the memory buffer is smaller than CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, I only have a single __constant qualifier in my kernel arguments, and I made sure the data is copied correctly into device memory (I read it back and compare with the original data I copy into device memory). Being at my wits end, I then simply replaced __constant with __global, and my code now works correctly.
So my question(s) - is there something else beside CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE that needs to be taken into account when using __constant ? Does the constant memory buffer have to be created with CL_MEM_READ_ONLY for this to work reliably (I can’t test this easily since most of the OpenCL support code on the host was written by somebody else, and I don’t want to muck in that part of the code if I can help it) ? Or is this a possible bug in the NVidia OpenCL implementation ?
P.S.: all tests on a x64-86 Linux system using ‘OpenCL 1.0 CUDA 3.0.1’ and 195.30 drivers.