Is there a way to measure time of a function using the block scheduler

I have a script that I am using to measure how long a function has been running under different schedulers. I used kwait function in place of the function that I wanted to test and it still has this same issue. Kwait works with all forms of schedulers and returns the time that it is supposed to. Simply the function last 2 seconds then returns and all forms of scheduler returns some sort of time that is expected. However when I use the block scheduler it will only report that the function runs for 3 nanoseconds and below. Any methods that I can use to measure the time accurately for specifically block scheduler?

Your question doesn’t make sense to me. The block scheduler is not something the CUDA programmer can directly access or control.

Perhaps you should show an example of what you are doing there, with the block scheduler.

This image is just of the clock

![image|462x320](upload://mzZHOusuAdCHow2reeaRB0DkWlW.png

This image is for the clock
image

This is kwait
image

This is how I switch what scheduler I want

none of that has anything to do with the CUDA block scheduler.

please don’t post pictures of code on these forums.

I suspect the problem you are having is some interaction between the clock() function you are using and the CPU thread behavior when the cudaScheduleBlockingSync policy is selected.

With an appropriate time resource, I don’t seem to have any difficulty getting a time measurement:

# cat t136.cu
#include <iostream>
#include <time.h>
#include <sys/time.h>
#define USECPSEC 1000000ULL

unsigned long long dtime_usec(unsigned long long start=0){

  timeval tv;
  gettimeofday(&tv, 0);
  return ((tv.tv_sec*USECPSEC)+tv.tv_usec)-start;
}

__global__ void k(unsigned long long dur){
  unsigned long long start = clock64();
  while (clock64() < (start+dur));}


int main(){

  unsigned long long my_duration = 20000000ULL;
  cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync);
  unsigned long long dt = dtime_usec(0);
  for (int i = 0; i < 100; i++) {
    k<<<1,1>>>(my_duration);
    cudaDeviceSynchronize();}
  dt = dtime_usec(dt);
  std::cout << "elapsed time: " << dt/(float)USECPSEC << "s" << std::endl;
}
# nvcc -o t136 t136.cu
# ./t136
elapsed time: 0.985097s
#
1 Like

This works perfect thanks! Sorry for my earlier confusion and posting images!