Hello,
i’ve a little problem while rewritting an application for the driver api. First of all the main reason for this conversion is the usage of cuCtxContext(). I’m programming a multithreading tool and therefore have to control, when a thread have access to a specified cuda context and they have to migrate from on thread to another. This thing really works, but now I get an error 2bd (means cuda driver error out of resources) when I launch the kernel.
I did not changed the parameters, same grid and block size (means the same smem size etc.) but as I said before, cuda returns the error.
I post my kernel call and maybe somebody has an idea what’s the problem is:
dim3 dimGrid(dimGX,dimGY,1);
dim3 dimBlock(dimBX,dimBY,1);
kernel1<<<dimGrid,dimBlock>>>(in , pitch, integer1 , integer2 , conf);
whereas the kernel call function is defined as:
__global__ void kernel1(unsigned int * in, size_t pitch, int integer1, int integer2, refinementConfiguration conf)
whereas refinementConfiguration is the following struct:
struct refinementConfig {
int int1;
int int2;
float fl1;
int int3;
int int4;
int int5;
};
typedef struct refinementConfig refinementConfiguration;
kernel parameters and refinementConfig stay the same, but I’m using the driver api for invocation:
int offset=0;
cuParamSetv(cuFunction,offset,in,sizeof(unsigned int *));
offset+=sizeof(unsigned int *);
cuParamSeti(cuFunction,offset,pitch);
offset+=sizeof(int);
cuParamSeti(cuFunction,offset,integer1);
offset+=sizeof(int);
cuParamSeti(cuFunction,offset,integer2);
offset+=sizeof(int);
cuParamSetv(cuFunction,offset,&conf,sizeof(refinementConfiguration));
offset+=sizeof(refinementConfiguration);
cuParamSetSize(cuFunction,offset);
cuFuncSetSharedSize(cuFunction,0); //There is no extern shared[] memory only exact defined in kernel
//Launch the Kernel
cuFuncSetBlockShape(cuFunction,dimBlock.x,dimBlock.y,dimBlock.z));
cuLaunchGrid(cuFunction,m_dimGrid.x,m_dimGrid.y);
with ptxas-options=-v I get the following output for a stand-alone program which runs:
Used 32 registers, 60+36 bytes lmem, 15880+12808 bytes smem, 72 bytes cmem[1]
By the way I’ve an NVIDIA GTX280 with 1GB RAM
Has anyone an idea what’s the problem? Is there something I’ve forgotten? Could be my own struct a problem for the driver api? But do not forget, that the whole program runs with the runtime-api as stand-alone and integrated in another program.
Best regards
Patrick