Warnings on array of pointers assigned to externed shared memory

MattWarmuth · April 2, 2014, 2:45am

Trying to make a templated demosaic resampler for all four RGGB configurations (format) that can utilize different resampler methods (class O, op()). Below is front matter. The compiler tells me on the last line below (tile[0][shridx]=…; lot’s more code that isn’t included) "Can’t tell what pointer points to, assuming global memory space (which is wrong). I would have thought that line 22 should make it pretty clear (later it gives me the same warnings when trying to use the other pointers…).

I’ve gotten this error before from bad syntax, but I don’t understand the issue here. I know I can use pointers to shared memory (done it before), and know I can do local fixed arrays. Can I not do both simultaneously?

template<class O>
__global__ void demosaic( unsigned int *RGBA, const size_t RGBApitch, const float *src, const size_t src_pitch,
			float *Y, const size_t Ypitch, ushort2 *UV, const size_t UVpitch,
			const uint2 imgsize, const unsigned int format, float normscale, O op )
{
	// Shared memory for color planes
	extern __shared__ float shrdata[];
	// Number of horizontal color samples/tile
	unsigned int shrwidth = blockDim.x+4;

	const int ix = UMAD(blockIdx.x,blockDim.x,threadIdx.x);
	int iy = INT_DOUBLE(UMAD(blockIdx.y,blockDim.y,threadIdx.y));
	float2 idata, *f2ptr;
	float *tile[4];

	// Offset into each shared memory color plane
	unsigned int temp = threadIdx.x+2;
	unsigned int shridx = UMAD(2+threadIdx.y,shrwidth,temp);
	// Stride (in samples) per color plane
	unsigned int blocksize = UMUL(shrwidth,threadIdx.y+4);
	// Pointers to the four color planes (UL, UR, LL, LR)
	tile[0] = &shrdata[0];
	tile[1] = &tile[0][blocksize];
	tile[2] = &tile[1][blocksize];
	tile[3] = &tile[2][blocksize];
	//////////////////////////////////
	// Load pairs of 'even' row pixels
	//////////////////////////////////
	// Get pointer to input image
	f2ptr=(float2*)((char*)src + UMUL(iy,src_pitch));
	// Are we in the valid range?
	if (ix<imgsize.x) { 
		// Load middle pixel values in pairs
		idata = f2ptr[ix];
		// Save UL pixel to shared memory
		tile[0][shridx]=idata.x;

Also tried

tile[0] = shrdata;
tile[1] = &shrdata[blocksize];

to be more explicit, but this didn’t change anything.

Googled ‘array of pointers shared memory cuda’, but kept getting links to partitioning the allocation of dynamically sized shared memory into different sections, which is obviously what I’m trying to do here (not even different types), but I can’t seem to make an array of pointers to these color planes within the shared memory.

If I make these four local variables (not an array), the compiler seems happy…

float *ULtile, *URtile, *LLtile, *LRtile;
...
	ULtile = shrdata;
	URtile = &ULtile[blocksize];
...
	ULtile[shridx]=idata.x;

…but don’t understand why I should have to do this.

The idea is that I’ll use a variable for the first index to get to the correct pointer, but I’d have to use an if/then block to choose the correct pointer using the method just described.

allanmac · April 2, 2014, 3:20am

Are you specifying an architecture (-arch) of at least sm_20? It’s pretty easy to produce the “cannot tell” warning on pre-sm_20 architectures.

njuffa · April 2, 2014, 3:35am

As allanmac points out these warnings are specific to sm_1x. The underlying issue is that in C/C++ “a pointer is a pointer is a pointer”, but at the hardware level, there are no generic pointers in sm_1x, only memory-space specific pointers. So the compiler tries to track the memory space under the hood. This works well when there is just a single level of dereferencing, but it can already break down when there are two levels of dereferencing.

Here “tile” is a pointer to an array of pointers to float, that is, two levels of de-referencing. Once the compiler loses track of the memory space, it defaults to the global memory space, and warns about the fact that it is operating on that assumption (which could well be wrong, in which case the machine code would be wrong and the program doesn’t work correctly).

In sm_20 and later architectures, the hardware was enhanced to support generic pointers, also adding conversion instructions for converting pointers between generic and memory-space specific pointers as the latter may have performance advantages. So now there is a good match between the C/C++ world view and the hardware world view. If your GPU is compute capability 2.0 or higher, use the appropriate -arch or -codegen flag during compilation, and these warnings should disappear.

allanmac · April 2, 2014, 3:54am

The other issue with creating a dynamically accessed fixed-size array is it’s going to reside in local memory instead of in registers. An STL/LDL operation is probably not what you want. Check the verbose output of your compile to see how many bytes of “lmem” are being used.

I would suggest either:

Keep it simple and always recalc your "offset" index with a multiply or multiply-add of your const blocksize.
Get medieval and use the PRMT opcode to dynamically select one of 4 precalculated 16-bit tile offsets in shared.
(what else?)

Of course, if the array is not dynamically accessed then you have nothing to worry about. :)

MattWarmuth · April 2, 2014, 9:41am

I compiling for 1.1 and 2.0 (thought I was also building for 1.2 and 3.0, but forgot to copy that line in the Release properties to the Debug side). And yes, when I saw this before, it would have been back on a compute 1.1 card (back when I was stylin’ with my 8600 GT) when I started out. We still have a C1060 that we run/test on.

Many thanks to both of you for explaining this. Surprised I haven’t run into this before. The PRMT does indeed look medieval; I’d probably take #1 before that, but I’ll play around with it.

Topic		Replies	Views
pointer to shared memory compiler problems CUDA Programming and Performance	19	14789	June 7, 2008
Want to get help:have anyone encountered this problem in cuda programming and solved it? cannot tel CUDA Programming and Performance	1	3300	April 27, 2011
Pointers to pointers on shared memory CUDA Programming and Performance	4	6908	March 30, 2009
Strange memory gremlins Getting pwned by pointers CUDA Programming and Performance	9	12272	July 1, 2009
Declaring what pointer points to Advisory: Cannot tell what pointer point CUDA Programming and Performance	9	5366	October 4, 2010
Help with 2D array passed to a Kernel: warning : Cannot tell what pointer points to, assuming global CUDA Programming and Performance	3	796	October 30, 2014
Pointer warning without using pointers Warning: Cannot tell what pointer points to CUDA Programming and Performance	1	2156	August 6, 2009
Pointer analysis workaround CUDA Programming and Performance	5	2198	December 2, 2008
Pointer arithmetic with shared memory CUDA Programming and Performance	1	1458	February 7, 2010
Error: "unaligned memory accesses not supported"? CUDA Programming and Performance	6	11187	April 2, 2009

Warnings on array of pointers assigned to externed shared memory

Related topics