CUDA dynamic parallelism

I am trying to compile my CUDA code with clang+±4.0 using the following command

clang++-4.0 -c bellmanFord.cu -Wall -std=c++14

Then I link with the main function to create executable. But I have kernel call inside a global function which results in the following error

bellmanFord.cu:131:13: error: reference to __global__ function 'relax' in __global__ function
            relax<<<1, count>>>(tid, count);
            ^
bellmanFord.cu:107:6: note: 'relax' declared here
void relax(int u, int count){
     ^
1 error generated.
Makefile:16: recipe for target 'bellmanFord.o' failed
make: *** [bellmanFord.o] Error 1

The following are the files:

  1. bellmanFord.hpp
  2. bellmanFord.cu
  3. main.cpp

I compile my main function as follows.

clang++-4.0 -o main main.cpp bellmanFord.o -L/usr/local/cuda/lib64 -lcudart -std=c++14 -code=compute_35 -I/usr/local/cuda/include