No compile error when calling device-only method from templated class on host, using nvcc

When I compile the following code with nvcc, I would expect to get a compile error because Functor::print() is only defined on the device but it’s called from a host function hostExecutefunctor(F f). However, the compiler finishes without a single warning. When I run the compiled program, it fails silently when executing f.print() This means my program only prints the line “About to execute the functor”.

#include <stdio.h>

struct Functor
{
  __device__ void print()
  {
    printf("Printing\n");
  }
};

template <class F>
void hostExecuteFunctor(F f)
{
  printf("About to execute the functor\n");
  f.print();
}


int main()
{
  Functor f;
  hostExecuteFunctor(f);
  printf("Successfully finished the program.\n");
  return 0;
}

The problem could be solved by specifying the functor method as __host__ __device__, but I rather stay minimalistic so it’s clear which methods will never run on CPU.
However, we have a lot of these functors in our code and a mistake is quickly made. Is there a way to get a compile error when the situation specified above occurs?

PS: I noticed that the other way around, calling a host function from a device function, the compiler does give an error. See the example below.

#include <stdio.h>

struct Functor
{
  void print()
  {
    printf("Printing\n");
  }
};


template <class F>
__global__ void kernelExecuteFunctor(F f)
{
  printf("About to execute the functor\n");
  f.print();
}


int main()
{
  Functor f;
  kernelExecuteFunctor<<<1, 1>>>(f);
  cudaDeviceSynchronize();
  printf("Successfully finished the program.\n");
  return 0;
}

Compile error:

silent_error_cpu.cu(16): error: calling a __host__ function("Functor::print") from a __global__ function("kernelExecuteFunctor<    ::Functor> ") is not allowed

My setup:
Operating system: Ubuntu 20.04
CUDA toolkit installed via ubuntu package ‘nvidia-cuda-toolkit’. Version : 10.1.243-3
NVIDIA driver version: 440.100

This maps to nvbug 4783921 . We will get back conclusion here when the ticket cycle is completed internally .

[Public] Hi Thomas ,

After our engineering team’s investigating , we think this is expected behavior . Calling a device only function from host code is a user error.
This is a documented undefined behavior case in the cuda c++ programming guide. Ref: CUDA C++ Programming Guide
Under the hood, nvcc introduces a dummy call for the device function “print” (needed for device function registration) which ends up being called in the case.
Throwing a warning for this is not straight forward as the host code is being compiled by the host compiler which we have no control over.

Best,
Yuki

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.