Compiler Bug or Very Poor Quality Error Message?


As a second excersise to get familiar with the OpenCL API I am in the process of converting my Tessendorf style wave generator(I already converted a shallow water simulator without problem). However the OpenCL 1(NVIDIA dont seem to think me important enough to allow access to 1.1:-( complier is producing a very unhelpful error message(well the assembler run after the compiler):

Build Failed!

Errors for device “GeForce GTX 460” (Gpu) with options “-cl-mad-enable -Werror -cl-nv-verbose”:

ptxas application ptx input, line 637; fatal : Parsing error near ‘,’: syntax error
ptxas fatal : Ptx assembly aborted due to errors
error : Ptx compilation failed: gpu=‘sm_21’, device code=‘cuModuleLoadDataEx_1a’
: Considering profile ‘compute_20’ for gpu=‘sm_21’ in ‘cuModuleLoadDataEx_1a’
: Retrieving binary for ‘cuModuleLoadDataEx_1a’, for gpu=‘sm_21’, usage mode=’ --verbose ’
: Considering profile ‘compute_20’ for gpu=‘sm_21’ in ‘cuModuleLoadDataEx_1a’
: Control flags for ‘cuModuleLoadDataEx_1a’ disable search path
: Ptx binary found for ‘cuModuleLoadDataEx_1a’, architecture=‘compute_20’
: Ptx compilation for ‘cuModuleLoadDataEx_1a’, for gpu=‘sm_21’, ocg options=’ --verbose ’

Is there a way to obtain the ptx code produced by the compiler to help trace the cause of the error? Does the 1.1 version, if it is ever released, improve this?

I havnt checked the source in detail yet, it is probably something very simple… But below is the source producing this error to aid in reproducing this compiler problem…

    private string fftWavesSrc = @"

float2 ComplexMul(float2 a, float2 b)
float re = (a.x * b.x) - (a.y * b.y);
float im = (a.y * b.x) + (a.x * b.y);
return (float2)(re, im);

float2 ComplexExpIm(float inIm)
float re;
float im = sincos(inIm, &re);

return (float2)(re, im);


float2 ComplexConjugate(float2 a)
return (float2)(a.x, -a.y);

float Motion(float2 k, float gravity)
float kLen = length(k);

return gravity*kLen;


//TODO: This could be optimized a fair bit.

__kernel void PreProcess(__global float2* currentHeight,
__global float2* initialFreq1,
__global float2* initialFreq2,
int2 resolution, float2 fftSize, float time, float gravity)
int i = get_global_id(1);
int j = get_global_id(0);

float2 initial = initialFreq1[j + i*resolution.x];
float2 initial2 = initialFreq2[j + i*resolution.x];

float pi2 = (float)M_PI * 2.0f;

float2 k;
k.x = ((j - 0.5f * resolution.x) * pi2) / fftSize.x;
k.y = ((i - 0.5f * resolution.y) * pi2) / fftSize.y;

float motion = Motion(k, gravity);

float2 result = ComplexMul(initial, ComplexExpIm(time*motion)) + ComplexMul(ComplexConjugate(initial2), ComplexExpIm(-time*motion));

currentHeight[j + i*resolution.x] = result;


__kernel void PostProcess(__global float2* currentHeight, image2d_t outImage, int2 resolution)
int i = get_global_id(1);
int j = get_global_id(0);

float2 h = currentHeight[j + i*resolution.x];

if(((j + i) & 1)>0)
    h = (float2)(-h.x, h.y);

//note: We need to permutate and scale the result to emulate an inverse FFT from the FFT. (ie scale by 1/N and reverse all but first row).
float scale = 1.0f / (resolution.x * resolution.y);//TODO: Remove common sub expression.
int destI = i;
int destJ = j;

if(destI > 0)
    destI = resolution.y - destI;

if(destJ > 0)
    destJ = resolution.x - destJ;

write_imagef(outImage, (int2)(destJ, destI), (float4)(h.x * scale, 0.0f, 0.0f, 0.0f));




Yes, there is, by passing CL_PROGRAM_BINARIES to clGetProgramInfo().

Have you tried a forum search with the PTX error message? Your problem very likely is the same as the one in this thread. In short, prefix your “image2d_t outImage” with “__write_only”.

Does CL_PROGRAM_BINARIES and clGetProgramInfo() return the assembly language or just machine code? Is there a tool which can be used to convert the binary file into textual assembly? Perhaps part of the CUDA SDK? (Never mind I read the other thread)

Yeah, I just found the problem with a missing “write_only” and now it compiles normally. However I noticed a worse compiler bug, if I comment out the body of the kernel the clBuildProgram() call generates an access violation. Surely that cant be according to spec:-)



It returns human-readable PTX assembly language.