can I transfer an short array of pointer to __global__ function?

Hello all,

I have a host array of size N, each elem is a CUDA device memory buffer.

My CUDA kernel needs to use these N input buffers, I searched and people say there are two ways to do it: merge all these buffers into a single buffer and send the pointer as the global function parameter, or create a device buffer to store the pointers of N buffers.

Is it possible I just transfer this host array of device pointer to global function as a parameter? The array size is pretty small, < 32.

Thanks,
Shuang

Yes, < 32 pointers is not very large. I do this all the time but with smaller numbers of pointers.

Try it out.

You’ll typically want to make sure you’re uniformly accessing the pointer array (every warp lane reads the same pointer), otherwise if performance isn’t suitable you might want to copy it to SMEM or try one of the other approaches you list.

#define READ_ONLY  const
#define READ_WRITE // nothing

struct my_args {
  .
  . // other stuff
  .  
  Type0  READ_ONLY  * pointer0;
  Type1  READ_WRITE * pointer1;
  ...
  Type31 READ_ONLY  * pointer31;
  .
  .
  .
  TypeX  READ_WRITE * pointers[32];
};

// launch the kernel
my_kernel<<<...>>>(my_args);

The actual byte limit on the parameter block escapes me but I think it is like 4096 bytes? It’s somewhere in the manual.

Here’s a quick example:

// ---------------------------------------

typedef int* kernel1_args[10];

extern "C" __global__ 
void kernel1(kernel1_args args)
{
  for (int ii=0; ii<10; ii++)
    args[ii][threadIdx.x] = 0;
}

// ---------------------------------------

struct kernel2_args {
  int* array[10];
};

extern "C" __global__ 
void kernel2(struct kernel2_args args) 
{
  for (int ii=0; ii<10; ii++)
    args.array[ii][threadIdx.x] = 0;
}

// ---------------------------------------

If you compile it with:

nvcc -arch sm_50 -Xptxas=-v -cubin kernel.cu

and dump it with:

cuobjdump.exe -sass kernel.cubin

You’ll see that the struct arg version (“kernel2”) does what you want:

IADD R2.CC, R0.reuse, c[0x0][0x150];
STG.E [R6], RZ;        }            
IADD.X R3, R11.reuse, c[0x0][0x154];
IADD R4.CC, R0.reuse, c[0x0][0x158];
STG.E [R2], RZ;        }            
IADD.X R5, R11.reuse, c[0x0][0x15c];

tru dat

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#function-parameters