ACC routine error when compiling with header files

I tried the following code

#include <iostream>
#include <vector> 
#include "foo1.h"

void foo();

int main(int argc, char** argv )
	std::vector< std::vector< std::vector<double> > > vec { 
		{{1,2},{3,4}, {5,6},{7,8}}, 
        {{9,10}, {11,12}}, 
        {{13,14}, {15,16}, {17,18}} };

    double* dVec= new double[4];

	#pragma acc parallel loop
	for (int k = 0; k <3; k++) {

		std::vector<std::vector<double>>& vec2d = vec[k];
		int L = vec2d.size();
		for (int i = 0; i < L; i++)
			dVec[i] = vec2d[i][1] - vec2d[i][0];

		for (int j=0; j<L; j++) {
			printf("k: %d j: %d vec0: %f, vec1: %f\n", k, j, vec2d[j][0], vec2d[j][1]);

    return 0;

void foo()

I compiled with pgc++ -fast -ta=tesla:cuda9.2,managed -o runEx foo1.cpp runEx.cpp -std=c++17 && ./runEx

and got the error

PGCC-S-0155-Procedures called in a compute region must have acc routine information: foo1() (runEx.cpp: 20)
PGCC-S-0155-Accelerator region ignored; see -Minfo messages  (runEx.cpp: 14)
PGCC/x86-64 Linux 19.10-0: compilation completed with severe errors

where foo1.h is

void foo1();

and foo1.cpp is

#include "foo1.h"
void foo1(){  }

It doesn’t make sense to me how if I comment out the call to foo1(), then the code works, given that foo and foo1 are the same

Since “foo” is defined in the same file as where it’s called, the definition of foo is visible to the compiler and hence it can implicitly create the device routine for you.

Since “foo1” is defined in a different file which is not visible to the compiler, it can’t implicitly create the device routine. Instead, just above foo1’s prototype in “foo.h”, add:

#pragma acc routine seq
void foo1();