Help with 2D array passed to a Kernel: warning : Cannot tell what pointer points to, assuming global

I am trying to operate on a 2D array in a kernel. I am modeling what I am doing after the setup for the inputs to cublasSGemmBatched() (from the CUDA sample batchCUBLAS) where you send a pointer to a 2D pointer to the kernel. But I get a warning upon compile: “warning : Cannot tell what pointer points to, assuming global memory space”

and the program crashes when I try to run it. This manner of expressing data seems to work for GemmBatched, but clearly I am missing something here.

Here is my setup:

Memory allocation:

unsigned **dev_matrix, **dev_Ptrmatrix;//matrix and pointer to matrix on device

dev_matrix = (unsigned **)malloc(param->N_mat * sizeof(*dev_hole_matrix));//N_mat is number of matrices

for (int i = 0; i < param->N_mat; i++)
{
     checkCudaErrors(cudaMalloc((void **)&dev_matrix[i], param->npts * sizeof(dev_matrix[0][0]))); 
}

checkCudaErrors(cudaMalloc((void **)&dev_Ptrmatrix, param->N_mat * sizeof(*dev_Ptrmatrix)));

cudaError_t err = cudaMemcpy(dev_Ptrmatrix, dev_matrix, param->N_mat * sizeof(dev_matrix[0]),cudaMemcpyHostToDevice);

kernel:

dim3 blocks=dim3(param->N_mat,param->ni);//ni and nj are the rows and columns
cudakern_fillMatrix<<<blocks,param->nj,0,param->streamArray[0]>>>(dev_Ptrmatrix,0);

//kernel is defined as:

__global__ void cudakern_fillMatrix(unsigned* matrix[],unsigned m)
{
	int j=threadIdx.x;//nj columns
	int i=blockIdx.x;//ni rows
	int nj=gridDim.x;
	int n=blockIdx.y;//N_mat, number of the matrix
	matrix[n][IDX2R(i,j,nj)]=m;//IDX2R(i,j,nj)=i*nj+j//this line is where the warning appears!
	};

Any enlightenment is welcome and appreciated!

GPUs prior to compute capability 2.0 (= sm_20) did not provide a unified memory space. Rather, they implemented multiple separate memory spaces, such as shared memory, constant memory, and global memory. All addresses were specific to each particular address space.

Addresses in C/C++ are represented by pointers, and in general “a pointer is a pointer is a pointer” meaning that there is a single unified address space. Since CUDA implements a C/C++ type language, the compiler for sm_1x platforms tries to bridge the gap between the abstract C/C++ model and what is actually provided by the hardware by attempting to track the memory space each pointer pertains to. If the code gets too complex (often this involves pointers to pointers), it gives up and simply assumes the pointer in question points to global memory. It warns about this because the generated code will be non-functional if the compiler’s assumption is incorrect (e.g. the pointer points to shared memory, not global memory). As long as the programmer can ensure that the compiler default of global memory is correct, the warning can be ignored.

In older versions of nvcc, the compiler would select sm_10 as the default build target. If you do not intend to build for sm_1x platforms, you can simply specify the desired build target with the -arch or -codegen command line switches of nvcc and these warnings will disappear. Note that support for sm_1x is deprecated in CUDA 6.5 and the default target architecture for nvcc is now sm_20. Support for sm_1x will be removed entirely in the toolkit for the next CUDA release; it has already been removed in the latest drivers.

Compute capability 2.0 (sm_20) and above support a unified memory space, so the memory model provided by the hardware now matches the abstract memory model used by C/C++. All modern CUDA features, such as use of an ABI, C++ class support, device-side printf(), require sm_20 at a minimum.

If you want to provide a short, complete code that I can compile and run, I’ll take a look.

Many thanks!

I was compiling for all archs including those below 2.0. Taking that out fixed the error messages. I am using CUDA 6.0, and I am trying to update an older code (which at the rate of NVIDIA’s releases, a 1-2 year old code is already outdated, in fact I should now update to 6.5!). As for the program crash, it was unrelated to the warning message. I had a memory issue somewhere much earlier in the program that caused it to crash when I transferred my matrix from the GPU. Took me almost all day to find it. But I wouldn’t have gotten on the right trail if I didn’t discount this error first. Thanks again, and for the great explanation!