runtime api to driver api problem


i’ve a little problem while rewritting an application for the driver api. First of all the main reason for this conversion is the usage of cuCtxContext(). I’m programming a multithreading tool and therefore have to control, when a thread have access to a specified cuda context and they have to migrate from on thread to another. This thing really works, but now I get an error 2bd (means cuda driver error out of resources) when I launch the kernel.

I did not changed the parameters, same grid and block size (means the same smem size etc.) but as I said before, cuda returns the error.

I post my kernel call and maybe somebody has an idea what’s the problem is:

dim3 dimGrid(dimGX,dimGY,1);

dim3 dimBlock(dimBX,dimBY,1);

kernel1<<<dimGrid,dimBlock>>>(in , pitch, integer1 , integer2 , conf);

whereas the kernel call function is defined as:

__global__ void kernel1(unsigned int * in, size_t pitch, int integer1, int integer2, refinementConfiguration conf)

whereas refinementConfiguration is the following struct:

struct refinementConfig {

  int int1; 

  int int2; 

  float fl1;

  int int3;

  int int4; 

  int int5;


typedef struct refinementConfig refinementConfiguration;

kernel parameters and refinementConfig stay the same, but I’m using the driver api for invocation:

int offset=0;

cuParamSetv(cuFunction,offset,in,sizeof(unsigned int *));

offset+=sizeof(unsigned int *);










cuFuncSetSharedSize(cuFunction,0); //There is no extern shared[] memory only exact defined in kernel

//Launch the Kernel



with ptxas-options=-v I get the following output for a stand-alone program which runs:

Used 32 registers, 60+36 bytes lmem, 15880+12808 bytes smem, 72 bytes cmem[1]

By the way I’ve an NVIDIA GTX280 with 1GB RAM

Has anyone an idea what’s the problem? Is there something I’ve forgotten? Could be my own struct a problem for the driver api? But do not forget, that the whole program runs with the runtime-api as stand-alone and integrated in another program.

Best regards


looks like you ran out of shared memory, sm1.3 has 16K of shared mem while ptxas shows

that your kernel exceeds this limit …

Also register usage is far too high, GTX280 has 16K registers per SM

so if #threads per grid times #registers exceeds 16K then your kernel will fail to launch

Thanks for your reply.

I mean that’s the problem. This is the ptx output of stand-alone program, which has NO problem. I have only integrated this kernel in another program and when I use the runtime api everything works fine. BUT if i change from runtime-api to driver-api calls, the kernel fails…

I also reduced the shared-mem size by decreasing the block-size but the error remains the same…

Are there other reasons for this error? Maybe, is this error caused by another previous call? I don’t think so because, ALL calls are included in CU_SAFE_CALL(), but could it be possible?

best regards