Hi,
In an attempt to add optimization for various card types I have been switching how I call my Kernels from normal calling to executing via a function pointer. I am likely doing something really stupid but with a frazzled brain I’m prepared to look silly on the internet and ask.
Basically I have a class containing a function pointer to a kernel, along with a function that allocates that member variable based on certain program settings. Some of the functions are templates, but I believed that as long as I assign a full instantiation of the template everything should be fine?
Basically the allocation of the function pointer seems to work fine, but as soon as I try to actually call it I get access violations deep in the bowels.
The actual crash happens in _cudaMutexOperation, inside the first cudaSetupArgument.
//Header:
typedef void (*fPtrType)(float, float, cuComplex const*, cuComplex*, long , long );
class CDevice {
private:
fPtrType m_funcPtr;
public:
void SetFuncPtr(settings s);
void ExecuteFunc(float a, float b, cuComplex const* c, cuComplex* d, long e, long f); {(*m_funcPtr)(a,b,c,d,e,f);};
}
//Source:
__global__ void funcOpt1(float a, float b, cuComplex const* c, cuComplex* d, long e, long f) {/*DOES STUFF */ };
template<int ti>
__global__ void funcOpt2(float a, float b, cuComplex const* c, cuComplex* d, long e, long f) {/* DOES STUFF DEPENDING ON ti */};
CDevice::ExecuteFunc(float a, float b, cuComplex const* c, cuComplex* d, long e, long f){
(*m_funcPtr)(a,b,c,d,e,f);
}
CDevice::SetFuncPtr(settings s)
{
if (s.UseFunc1)
{
m_funcPtr = funcOpt1;
}
else
{
if (s.TemplateOne)
{
m_funcPtr = funcOpt2<1>;
}
else
{
m_funcPtr = funcOpt2<0>;
}
}
}