Clang Openmp Offloading

I’m trying offload c code into GPU with Openmp.

My c code is given below.

#include <unistd.h>
#include <stdlib.h>
#include <omp.h>
#include <stdio.h>

double start;
double end;
int main (void) {
  int sum = 0;
  start = omp_get_wtime();
  printf("%d\n",omp_get_num_devices() );

  #pragma omp target parallel for map(tofrom:sum)
  for(int i = 0 ; i < 2000000000; i++) {
    sum += 2;

}
  end = omp_get_wtime();
  printf ("time %f\n",(end-start));
  printf("sum = %d\n",sum);
  return 0;
}

I’m using clang 4.0.0 and cuda 9.0

I compiled code with

clang -target x86_64-linux-gnu test1.c -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda

and it compiled with any error.

But when I execute “nvprof ./a.out” it runs only in the host and not in the device.