16kb memory limitation

Hey Everyone,

I’m trying to execute this kernel. Everything works properly until the total memory size of the inputs exceeds 16kB, the kernel fails to work properly. I thought 16kB was the limit for shared memory, not device memory. When arr_size is 496, everything runs fine. However, if I increase arr_size to 512, I get an “invalid argument” error. Anyone have any ideas?

Thanks,
Joe

unsigned int arr_size=496;
unsigned int mem_size = sizeof( float) * arr_size;
float* h_x = (float*) malloc( mem_size);
float* d_x;
//initialize d_x
cutilSafeCall( cudaMalloc( (void**) &d_x, mem_size));
cutilSafeCall( cudaMemcpy( d_x, h_x, mem_size,cudaMemcpyHostToDevice) );
float* d_ox;
cutilSafeCall( cudaMalloc( (void**) &d_ox, mem_size));
float* h_ox = (float*) malloc( mem_size);

The above code is the same for _y,_z,_theta,_phi,_ox,_oy, and _oz too.

dim3 cylgrid(arr_size/16,1,1);
dim3 cylthreads(16,1,1);

cylkernel<<<cylgrid,cylthreads,mem_size*8>>>(d_x,d_y,d_z,d_theta,d_phi,d_ox,d_oy,d_oz);

cutilCheckMsg(“Kernel execution failed”);


mykernel( float* g_ix,float* g_iy,float* g_iz,float* g_itheta,float* g_iphi,float* g_ox,float* g_oy,float* g_oz)
{

#include “template.h”

const unsigned int tid = threadIdx.x;
const unsigned int bid = blockIdx.x;

shared float sx[16],sy[16],sz[16];

sx[tid]=g_ix[bid16+tid];
sy[tid]=g_iy[bid
16+tid];
sz[tid]=g_iz[bid*16+tid];

__syncthreads();
sx[tid]=10.0;
sy[tid]=10.0;
sz[tid]=10.0;

g_ox[bid16+tid]=sx[tid];
g_oy[bid
16+tid]=sy[tid];
g_oz[bid*16+tid]=sz[tid];

__syncthreads();

}

(snip, snip)

You are passing mem_size*8 as the size of dynamic shared memory, so when you increase arr_size to 512, you request 16kB of dynamic shared memory in addition to the 3 * 16 * sizeof(float) bytes of shared memory that is statically allocated in the kernel body.

(snip, snip)

You are passing mem_size*8 as the size of dynamic shared memory, so when you increase arr_size to 512, you request 16kB of dynamic shared memory in addition to the 3 * 16 * sizeof(float) bytes of shared memory that is statically allocated in the kernel body.

Many thanks,

I had misunderstood the purpose of the memory size there. Now it works fine!

Joe

Many thanks,

I had misunderstood the purpose of the memory size there. Now it works fine!

Joe