I’m trying offload c code into GPU with Openmp.
My c code is given below.
#include <unistd.h>
#include <stdlib.h>
#include <omp.h>
#include <stdio.h>
double start;
double end;
int main (void) {
int sum = 0;
start = omp_get_wtime();
printf("%d\n",omp_get_num_devices() );
#pragma omp target parallel for map(tofrom:sum)
for(int i = 0 ; i < 2000000000; i++) {
sum += 2;
}
end = omp_get_wtime();
printf ("time %f\n",(end-start));
printf("sum = %d\n",sum);
return 0;
}
I’m using clang 4.0.0 and cuda 9.0
I compiled code with
clang -target x86_64-linux-gnu test1.c -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
and it compiled with any error.
But when I execute “nvprof ./a.out” it runs only in the host and not in the device.