OpenCl very odd crash on compiling

Hi all,
today while compiling my project on my laptop i encountered a very weird error.

This is one the kernel where the problem occured:

kernel void get_used_voxels(read_only image3d_t dataGrid, 			//0 - Data volume
							read_only image2d_t edgesTable,			//1 - Edge table
							read_only image2d_t numVertexTable,		//2 - Vertices per voxel table
							const float threshold,					//3 - Threshold
							global uint *vMask,						//4 - Vertex mask
							global uint *eMask,						//5 - Edge mask
							global uint *numVertex,					//6 - Number of vertices per voxel cube
							global uint *occupied,					//7 - 1 if the voxel is used, 0 otherwise
							uint4 vSize,							//8 - Global size of the voxel volume
							local float *cubeVertex
							){				
							
	const uint idx = get_global_id(0);
	const uint threadId = get_local_id(0);
	const uint wgs = get_local_size(0);

//rIndex contains few bitwise operation for morton order index
	const uint4 coord = rIndex(idx, vSize);

	if(coord.x >= vSize.x || coord.y >= vSize.y || coord.z >= vSize.z)
		return;
	
//	float cubeVertex[8];

	//The 8 vertex of the cube
	cubeVertex[0*wgs + threadId] = read_imagef(dataGrid, smp, (int4)(coord.x,     coord.y + 1, coord.z    , 0)).x;
	cubeVertex[1*wgs + threadId] = read_imagef(dataGrid, smp, (int4)(coord.x + 1, coord.y + 1, coord.z    , 0)).x;
	cubeVertex[2*wgs + threadId] = read_imagef(dataGrid, smp, (int4)(coord.x + 1, coord.y,     coord.z    , 0)).x;
	cubeVertex[3*wgs + threadId] = read_imagef(dataGrid, smp, (int4)(coord.x    , coord.y    , coord.z    , 0)).x;	
	
	cubeVertex[4*wgs + threadId] = read_imagef(dataGrid, smp, (int4)(coord.x,     coord.y + 1, coord.z + 1, 0)).x;
	cubeVertex[5*wgs + threadId] = read_imagef(dataGrid, smp, (int4)(coord.x + 1, coord.y + 1, coord.z + 1, 0)).x;
	cubeVertex[6*wgs + threadId] = read_imagef(dataGrid, smp, (int4)(coord.x + 1, coord.y,     coord.z + 1, 0)).x;
	cubeVertex[7*wgs + threadId] = read_imagef(dataGrid, smp, (int4)(coord.x,     coord.y,     coord.z + 1, 0)).x;

	uint mask = 0;
	mask |= ((cubeVertex[0*wgs + threadId] < threshold) << 0);
	mask |= ((cubeVertex[1*wgs + threadId] < threshold) << 1);
	mask |= ((cubeVertex[2*wgs + threadId] < threshold) << 2);
	mask |= ((cubeVertex[3*wgs + threadId] < threshold) << 3);
	mask |= ((cubeVertex[4*wgs + threadId] < threshold) << 4);
	mask |= ((cubeVertex[5*wgs + threadId] < threshold) << 5);
	mask |= ((cubeVertex[6*wgs + threadId] < threshold) << 6);
	mask |= ((cubeVertex[7*wgs + threadId] < threshold) << 7);
	
	vMask[idx] = mask;
//Both edgesTable and NumVertexTable are 2d texture created with opengl and shared in opencl
	eMask[idx] = read_imageui(edgesTable, smp, (int2)(mask, 0)).x;
	numVertex[idx] = read_imageui(numVertexTable, smp, (int2)(mask, 0)).x;
	
	occupied[idx] = (mask > 0 && mask < 0xFF);
}

When i compile this with clcc and clBuildProgram they both crash with access violation, but if I swap the last lines in this way:

vMask[idx] = mask;
	eMask[idx] = read_imageui(edgesTable, smp, (int2)(mask, 0)).x;
	numVertex[idx] = read_imageui(numVertexTable, smp, (int2)(mask, 0)).x;
	
	occupied[idx] = (mask > 0 && mask < 0xFF);

the code compiles…

Guess this is a big bug in the compiler, isn’t it?

The platform where the crash occured is a GeForce M9600GT, while on my primary workstasion with a GeForce 470GTX the project compile (and run) fine.

Any ideas?

Your ‘swapped’ lines are identical on the top and bottom – assuming copy-paste error, go at it again.

It probably helps if you can narrow down the compiler crash to a simple reproducible example.

Chances are you will end up with a real bug, or you will find that you fed in incorrect arguments to some of those functions and will be able to see where the error is. Give it a shot.

Yes, copy-past error the swapped line are:

vMask[idx] = mask;
occupied[idx] = (mask > 0 && mask < 0xFF);
eMask[idx] = read_imageui(edgesTable, smp, (int2)(mask, 0)).x;
numVertex[idx] = read_imageui(numVertexTable, smp, (int2)(mask, 0)).x;

In the unswapped version of the line the crash occurs when I try to assign the returned value from one of the two read_imageui to the global vector. It appens even if use a temp variable and then assign that variable to the global array, like this:

uint tmp = read_imageui(numVertexTable, smp, (int2)(mask, 0)).x;
numVertex[idx] = tmp;

I thought that I was feed the functions with wrong parameters, but I ruled it out when found out that swapping the line “fixed” the problem.

I tried with the version of driver included in the cuda sdk and with the last version available: same results.

Unfortunately I’m not familiar with the OpenCL functions, so I can’t comment on those, but perhaps someone else can.

Are the compilers on both systems the same version(s)? It might be that either nvcc or gcc versions are at fault here, and GPU architectures are irrelevant.

Try the approach I mentioned on getting a small reproducible example, it does look like it might be a bug given what you’ve mentioned – NVIDIA would want a small example if you send in a bug report.