Compile problems with global arrays Global array compile-time indexing fails, but local arrays work

Greetings,

I’ve got some code that does fixed indexing into global memory, but

PTX doesn’t like it and won’t build the program. If I do the same

things with indexing into local memory, PTX likes it just fine. Here’s

some example code:

__kernel void test_kernel1( __global float *some_data, __global float cov_matrix[9] )

{

	   // does not compile

	   cov_matrix[0] = 0.0f;

	   cov_matrix[1] = 0.0f;

	   cov_matrix[2] = 0.0f;

	   cov_matrix[3] = 0.0f;

	   cov_matrix[4] = 0.0f;

	   cov_matrix[5] = 0.0f;

	   cov_matrix[6] = 0.0f;

	   cov_matrix[7] = 0.0f;

	   cov_matrix[8] = 0.0f; // <- compile error here

}

This yields the following message:

But if I try to build this:

__kernel void test_kernel2( __global float *some_data )

{

	   __local float cov_matrix[9];

	   // compiles

	   cov_matrix[0] = 0.0f;

	   cov_matrix[1] = 0.0f;

	   cov_matrix[2] = 0.0f;

	   cov_matrix[3] = 0.0f;

	   cov_matrix[4] = 0.0f;

	   cov_matrix[5] = 0.0f;

	   cov_matrix[6] = 0.0f;

	   cov_matrix[7] = 0.0f;

	   cov_matrix[8] = 0.0f;

}

it compiles without any problems.

For the life of me, I can’t figure out what’s causing this. I’m using

the 190.29 driver on 32-bit 9.10 Ubuntu. Any ideas? BTW, I had tried

to upgrade to the 195.36.15 driver, but it was a disaster on my machine,

so I uninstalled the new driver and reinstalled the old one. There

might be some residual 195.36.15 badness floating around.

Thanks in advance!

Get rid of that [9] & make it a pointer like some_data. This implies that the kernel is allocating global memory. It is not allowed. Access it like cov_matrix[3].

cov_matrix is a kernel argument. You have to:

  • create a mem object using clCreateBuffer(), this is where your 9 is specified (actually 36 in bytes)

  • possibly copy initial values from the host to it using clEnqueueWriteBuffer ()

  • assign the mem object as a parameter to the kernel, at least once using clSetKernelArg()

  • execute the kernel as many times as needed

  • possibly copy values to the host to it using clEnqueueReadBuffer ()

D’oh! Thanks for pointing this out – I hadn’t realized this.