Problems calling __device__ function in __host__ __device__ function

Hi,

I got some problems calling a host device function. It seems like, the program always tries to call the compiled host code instead of the device code.

My code looks like:

main.cu
...
void foo(int *a, int *b){
  dev_sth(a,b)
}
...
foo(a,b)
...
sth.h
...
void do_sth(int *a, int *b);
void dev_sth(int *a, int *b);
sth.cu
...
__host__ __device__ void do_sth(int *a, int *b){
  a[1]=....
//some code
}
...
__global__ void sth(int *a. int *b){
  do_sth(a,b);
}

...
void dev_sth(int *a, int *b){
 if(!use_cuda){
  do_sth<<<blockspergrid, threadsperblock>>>(a,b);
 }else{
  sth(a,b)
 }
}

Thus the code does following:

  1. call function foo
  2. foo calls dev_sth
  3. dev_sth checks, wheter to use cpu or gpu code, cpu version runs fine (sth(..)), but if cuda:
  4. start __global__ function sth(...), which calls do_sth(...)

Now it appears, as if the device do_sth(…) call in the global function tries to call the host do_sth function.

Am I doing something totally wrong? I would be really thankfull if anyone got some hints or the solution.

Thanks in advance.

What makes you think so?

If I comment out all of the host functions everything works fine. Furthermore, if I debug with NSight, weirdly only some of the threads got the following error:

Status: Exception
Exception: OutOfRangeLoad
Exception Details: MemorySpace=Global Size=4

I got another news. As the compiler says:
“Cannot tell what pointer points to, assuming global memory” pointing to the line with “do_sth(a,b)” within the global function, I think I’m on the right way.

Now the question: How to tell the compiler it should link to device memory?

Hi,

if anyone got the same issue, I was able to solve the problem. I don’t know why but I had to forceinline the device code. Thus the working code now looks like

sth.cu
    ...
    __host__ __device__ __forceinline__ void do_sth(int *a, int *b){
    a[1]=....
    //some code
    }
    ...
    __global__ void sth(int *a. int *b){
    do_sth(a,b);
    }

    ...
    void dev_sth(int *a, int *b){
    if(!use_cuda){
    do_sth<<<blockspergrid, threadsperblock>>>(a,b);
    }else{
    sth(a,b)
    }
    }

Thanks anyway.