Segmentation fault in clBuildProgram

I get a segmentation fault in clBuildProgram when building this:

__kernel void lineOfSight(__read_only image2d_t dem)



I am using a 9400GT, driver 260.19.21, with CUDA SDK 3.2.1, on Ubuntu 9.10 x64.

I have attached the output of oclDeviceQuery.

EDIT: I have attached a core dump.

It works without the __read_only arguments
Strange (driver 263.0 on windows XP, crash with this argument)

I had the exact same problem a while ago and reported it to NVIDIA (bug ID 756272). According to them this is fixed after internal build “101128-7986212 (CUDA_A)” and will be resolved in the CUDA 4.0 release (no ETA yet).

Thanks for the information.

Hopefully SDK 4.0 can’t be too far away as I imagine it will add OpenCL 1.1 support (which AMD and Intel have already released).

Is it possible to get a newer driver or beta SDK that has this fixed?

Cant see anything since 3.2 RC2 on the NVDeveloper site.

Not to my knowledge. However, I’ve never seen this bug occur for kernels that actually do something useful, only for “dummy” kernels. Kernels with more arguments in addition to a “__read_only image2d_t” argument work fine for me.

OK I will try it now.

I first got this bug when creating an empty kernel that just had an image_t and 3 float arguments.

I have since created the prototype of my kernel using a float buffer, and my own bilinear sample function.

I will convert it back to using an image buffer and see if it now works.

I have found several other bugs in their compiler when you have an empty kernel, such as the PTX missing commas etc. They still need a lot of work on their OpenCL -> PTX compiler.

Also, anyone know what they use to compile it with? Is it the Open64 compiler, or is it an LLVM compiler?

That’s what I was trying to say: I’ve not seen these problems with non-empty, i.e. real-world, kernels so far. Empty kernels are just not very relevant use cases, and fixing those bugs for non-working kernels has low priority I’d say.

They are using clang as a front-end, i.e. LLVM.


I report that I now have CL image support working!

I get at least 10-50x speed boost using CL image objects over a custom bilinear sampler that I was using!

CL image objects are perfect when you want cheap bilinear filtering, and the ability to sample off the image using the clamp to border setting.

One question:
Is there a way to change the CL image border color like you can in OpenGL? If not, how can I suggest it be added to OpenCL 1.2?
At the moment the current (0, 0, 0, 1) color is good for me, but I would like to be able to change it later.

Also, considering that NVIDIS’s compiler barfs at empty kernels, I think Khronos need to add some empty kernel test cases to their OpenCL conformance tests…

I did not find anything in the spec to do so, but you probably could just use an OpenGL texture with a custom border color and call clCreateFromGLTexture2D(). However, even if that worked, it could be argued whether that’s a bug in the OpenCL implementation because section “ Determining the border color” of the OpenCL 1.1 spec only allows 0 for RGB.


The crash has come back again.

I changed the code back but it did not go away…

I now have plenty of code in my kernel and it was working for the last 2 days.

All I did was to move my sample_t from being globally defined, to being defined inside my kernel function because the AMD compiler complained. (Does anyone know what the official spec says about whether or not a const sampler_t can be defined as a global variable? NVIDA compiler seems happy with it being global, but AMD not.)

This is really annoying. I am going to try installing successively older driver versions until I can find a version before this bug was added.

Also has anyone noticed that NVIDIA’s image support is not compliant with the specs?
In section 4.2, pg 34 it says:
CL_DEVICE_IMAGE2D_MAX_WIDTH - Max width of 2D image in pixels. The minimum value is 8192 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.
CL_DEVICE_IMAGE2D_MAX_HEIGHT - Max height of 2D image in pixels. The minimum value is 8192 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.

But on my GeForce 330, it returns:

Which is just bad.

It also conflicts with what the NVIDIA OpenCL Programming Guide.
In section C.1, pg 48:
Maximum width and height for a 2D texture reference bound to linear memory or a CUDA array - 65536 x 32768 (for compute capability 1.x)

Why does NVIDIA the inventor and leader of GPGPU have the worst OpenCL implementation? Do they secretly wish OpenCL to disappear so that once again proprietary CUDA is the only option?

FYI, this global sampler definition compiles fine for me with both NVIDIA (driver 263.06) and ATI (Stream SDK 2.3):


Odd. This is also what I’m seeing on a GTX 260 with 263.06 drivers, but on a GTX 285 with 258.19 drivers I’m getting 8192 for both width and height. I guess this is a bug in the 263.06 driver, which is a development driver after all. I’ll file a bug report about it and see what they say.

Performance-wise, I believe they still have the best implementation yet, although there certainly is room for improvement.

Check the build log and you will see a warning.

I like to build with -Werror just to be safe.

My application always prints the build log to stdout if it’s non-empty, but there are neither warnings nor errors for me.

Are you on Windows?

I got that warning with SDK 2.3 on Ubuntu 9.04 x64

Yes, I’m on Vista x64.

I have just tested it on Windows 7 x64 and I do get that warning.

I tried to play around with the declaration a bit. I only get a warning if I use the (invalid) “const” specifier instead of “__constant” (I believe the sentence “The const qualifier can also be used with the __global qualifier to specify a read-only buffer memory object” in the OpenCL 1.1. spec on page 185 is a typo, it should say “__constant” instead). For “const” the AMD compiler understandably complains “warning: global variable declaration is corrected by the compiler to have addrSpace”.

Interesting, if I use __constant instead of const the error goes away.

Here is my exact line (realising that I never gave it in previous posts):

__constant sampler_t demSampler = CLK_NORMALIZED_COORDS_FALSE |

                             CLK_ADDRESS_CLAMP           |

                             CLK_FILTER_LINEAR           ;

I did not realise that __constant and const were similar yet different…

I always thought __constant is to specify that a kernel argument was from the DirectX/OpenGL constant buffer cache, and that const was to tell the compiler to ensure that no code attempts to modify that variable.

I would have thought that const for a global sampler_t was correct though as a sampler_t cannot be modified by the kernel. (In fact in the NVIDIA PTX you can see that a global sampler_t is actually compiled in)

The strange this is that on pg. 236 of the 1.1 spec:

So either the spec is wrong or AMD is wrong.

Just goes to show that anyone serious about OpenCL compatibility should test kernel C code on both AMD and NVIDIA compilers, as they are likely to pick up different things. (Somewhat similar to the OpenGL shader compilers from AMD v NVIDIA, where AMD’s compiler is always more strictly following the spec than NVIDIA’s)

I might go post something on the Khronos forum to see what they say about this. (I have posted to here)


Have you found the kernel.h file NVIDIA have embedded in their DLLs? I notice it defines CLK_FILTER_ANISOTROPIC… (line 321) I put it in and it compiled fine but I am not able to test to see if it actually works (I think I need to specify the filter level in the upper 4 bits). Looks like OpenCL 1.2 might have some further image extensions? (Hopefully they add border color control). I have attached the header.

After thinking about it again, I believe you’re right. “__constant” is an address specifier, while “const” is a semantic keyword.

Guessing from clUtil’s source code [1], its author Spoo (aka Rick Weber) was already aware of this issue with AMD / NVIDIA. Maybe he knows some details.

Look interesting, thanks.


After staring at the spec for a bit longer I found this on pg. 184:

This might be what the AMD compiler is enforcing? Seems like all global VARIABLES need to be __constant (and hence const is inferred).

But a sampler_t is STATE not variable (pg. 192 - it is not explicit, but given the restrictions it can be inferred):

So for a VARIABLE it needs to be __constant (and hence stored in global device memory with read-only caching).

For a STATE it needs to be const (and hence stored hard coded as a constant value in the compiled assembly language).

Maybe the specs need to make this distinction clearer… (add a new keyword, like __state instead of const)

So NVIDIA correctly treat sampler_t as state and don’t require __constant (in fact using __constant should generate an error), and use const as specified in spec.

AMD incorrectly assume all global objects are variables (even if they are state) and thus require __constant to not generate an error.

This brings up one other interesting question: does a sampler_t count towards the device memory limit on __constants? (It shouldn’t as state is hard coded into the assembly instructions)