clBuildProgram excepting with access violation error

I’m currently trying to compile a kernel after having set up the context, command queue, and loading the project but keep getting this error: Access violation writing location 0x000000000000000c. The crash itself is happening inside of nvcompiler.dll. The call to clBuildProgram isn’t that complicated, just a clBuildProgram(program, 0, NULL, NULL, NULL, NULL); so there isn’t that much that should go wrong. I also have a fair amount of error checking beforehand to bail out if I fail at any point in initializing the devices and creating the context or queue, so I’m fairly confident I actually got those right. So I’m very much drawing a blank. The only possible explanation I can come up with is that the string I’m using to represent the kernel is somehow malformed, but that also seems unlikely. Code follows and any insight would be very helpful. I’m currently on Windows 7 x64, using the beta 3.0 CUDA, and on the 196.21 WHQL driver.

Kernel:

[codebox]const char *kernelSource =

"__kernel void accumulate(__global ushort *reference, __global ushort *scanPoints, __global int *out)\n"\

"{\n"\

	"int xIndex = get_local_id(0);\n"\

	"int yIndex = get_local_id(1);\n"\

	"int zIndex = get_local_id(2);\n"\

	"float deltaTheta = (float)(xIndex - 40);\n"\

	"float displaceX = (float)(yIndex - 20);\n"\

	"float displaceY = (float)(zIndex - 20);\n"\

	"float deltaThetaRad = radians(deltaTheta);\n"\

	"float cosDTheta = cos(deltaThetaRad);\n"\

	"float sinDTheta = sin(deltaThetaRad);\n"\

	"float result = 0;\n"\

	"for(int index = 0; index < 361; ++index)\n"\

	"\n"\

	"float degRad = radians(index);\n"\

	"float preDeltaX = cos(degRad) * scanPoints[index];\n"\

	"float preDeltaY = sin(degRad) * scanPoints[index];\n"\

	"int deltaX = (int)(cosDTheta * preDeltaX - sinDTheta * preDeltaY + displaceX);\n"\

	"int deltaY = (int)(sinDTheta * preDeltaX + sinDTheta * preDeltaY + displaceY);\n"\

	"result = result + reference[deltaX + deltaY * 1024];\n"\

	"\n"\

	"yIndex = yIndex * 80;\n"\

	"zIndex = zIndex * 80 * 40;\n"\

	"out[xIndex + yIndex + zIndex] = result;\n"\

"}\n";

[/codebox]

Short snippet of main program:

[codebox]program = clCreateProgramWithSource(gpuContext, 1, (const char**)&kernelSource, NULL, &errorCode);

if(!program)

{

	cerr << "Error: Failed to create program: " << errorCode << endl;

	clReleaseCommandQueue(queue);

	clReleaseContext(gpuContext);

	std::cin >> temp;

	return -1;

}

errorCode = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);

if(errorCode != CL_SUCCESS)

{

	cerr << "Error: Failed to build program: " << errorCode << endl;

	clReleaseProgram(program);

	clReleaseCommandQueue(queue);

	clReleaseContext(gpuContext);

	std::cin >> temp;

	return -1;

}[/codebox]

First off, get_local_id will return an unsigned int. I am not sure if the compiler will implicitly cast to int but if it does and the value climbs expect an overflow. Addressing an array with a signed value isn’t usually a good practice. I don’t know if that is your problem with building though =.

If that was the problem, I would expect that to happen during the run, or at the very least the build would fail and I could diagnose the issue instead of having the nv compiler effectively segfault on me.