Increment Host variable in kernal code and report continuously back to CPU for progress of work

I want to control my progress of GPU kernel code from cpu using an increment variable “count_global”. I write a code that work perfectly for CPU multi-threads but not for CUDA GPU. Is there any way that i can get progress of GPU from CPU all the time(Suppose GPU will be busy may of hours, and i want what percent of work has been one).
My code is:

` > pthread_create (&progress_report_tid, NULL, progress_report, &args_progress); // this thread monitor the GPU code (CPU Thread) : args_progress.global_count



void *progress_report(void *progress_report_args){
const double size_total = ((uint64_t)progress_report_args);
while(global_count < size_total)

The count globale is declared in .cuh as:

extern device managed uint64_t global_count;

kernal code parallel;