Increment Host variable in kernal code and report continuously back to CPU for progress of work

I want to control my progress of GPU kernel code from cpu using an increment variable “count_global”. I write a code that work perfectly for CPU multi-threads but not for CUDA GPU. Is there any way that i can get progress of GPU from CPU all the time(Suppose GPU will be busy may of hours, and i want what percent of work has been one).
My code is:
main.cu:

` > pthread_create (&progress_report_tid, NULL, progress_report, &args_progress); // this thread monitor the GPU code (CPU Thread) : args_progress.global_count

kernal_algo<<<1,1>>>(dev_bf_struct);

`
progress_report.cu

void *progress_report(void *progress_report_args){
const double size_total = ((uint64_t)progress_report_args);
while(global_count < size_total)
{
fprintf(stderr,“%0.1f%%”,(count_size_global/size_total)*100);
sleep(1);
}
pthread_exit(NULL);
}

The count globale is declared in .cuh as:

extern device managed uint64_t global_count;

Kernal_code.cu

kernal_algo(args){
kernal code parallel;
global_count++;
}