CPU timing on Jetson TK1

Hi all,
I am trying to compute the processing time of CPU implementation to compare it with GPU implementation
I started by using Clock_t instruction available from the time library
I used the instruction system sleep to create a pause of 10 seconds
However what I obtain as result is less than 4 ms
here is my program,

#include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    #include <math.h>

    #include <time.h>
int main ()
{
clock_t start, stop;
double elapsed;

start = clock();
system("sleep 10");

stop = clock();
elapsed = ((double)stop - start) / CLOCKS_PER_SEC;
FILE *f = fopen("result.txt", "w");
fprintf(f," %.0f \n %.0f \n %.0f \n %.2f \n %f \n %f",(float)start,(float)stop, (float)(stop - start),(float)CLOCKS_PER_SEC, (float)(stop - start)/CLOCKS_PER_SEC,elapsed);

return EXIT_SUCCESS;
}

the whole program is the simple frame difference, when I use the chronometer I obtain 30 s for full HD video, but the clock_t instruction gives me approximately the half

#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include "opencv2/opencv.hpp"


#include <iostream>
#include <sstream>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>



//using namespace cv;
using namespace std;

int main()
{
clock_t cpu_startTime, cpu_endTime;
float fps;        // FPS average
unsigned int frameCount = 0;
double cpu_ElapseTime=0;

cpu_startTime = clock();
	
cv::VideoCapture input("/home/ubuntu/MyExamples/MD0/cam2Wildtrack.MP4");
cv::Mat img, img_prev0, img_prev, frame, frameDelta, thresh, gray_img;


input.read(img);
img.copyTo(img_prev0);
cv::cvtColor(img_prev0, img_prev, CV_BGR2GRAY);
cv::GaussianBlur(img_prev, img_prev, cv::Size(9, 9), 0);

	while(input.read(frame)) 
		{
                frameCount++;
		cv::cvtColor(frame, gray_img, CV_BGR2GRAY);
		cv::GaussianBlur(gray_img, gray_img, cv::Size(9, 9), 0);
		cv::absdiff(img_prev, gray_img, frameDelta);
                gray_img.copyTo(img_prev);
		cv::threshold(frameDelta, thresh, 25, 255, cv::THRESH_BINARY);

		cv::imshow("Camera", thresh);

		if (frameCount==1999) 
		{
                cpu_endTime = clock();
		cpu_ElapseTime = ((cpu_endTime - cpu_startTime)/(float)CLOCKS_PER_SEC);
	        fps=float(frameCount)/(cpu_ElapseTime);
		FILE *f = fopen("result.txt", "w");
		fprintf(f, "frame per second: %3.2f fps \n elapsed time: %3.2f s", fps, cpu_ElapseTime);
			  break;
		}
		char c=cv::waitKey(1);
		if(c == 27)
			{
			//exit if ESC is pressed
			break;
			}
		}
}

Any help is very appreciated
Thanks in advance
Kamal

I haven’t tried this out, but if you are measuring clock time of CPU, not actual time, then the result will be of time in CPU core, excluding time GPU is running without using the CPU. Are you really interested in knowing the time in the CPU core? You might want to consider using other measures of time if it is actual time you are interested in.

Hi linuxdev,
Yes I want to measure the time in CPU core. In his code and the above one I have not used any cuda code, so the whole code will be executed on CPU.

Am I correct that you are asking about how the actual time spent is around double the time spent on CPU? If so, then things are working as they should.

Consider that the system multitasks, and that just because the program runs, the program does not lock the CPU to itself. You could in fact rewrite this or set this up such that it forces any CPU other than CPU0 to work entirely on your program. The goal would be to exclude other processes from using that core.

Even so, there would still be times your process blocks and stops using the CPU. For example, memory access through the memory controller is a shared resource with other parts of the operating system. Similar can be said about the eMMC.

One thing you might want to experiment with is the “renice” the process to have a higher priority. I am only guessing, but as your process priority goes up (as the “nice” becomes more negative) odds are that the actual CPU time would more closely approach total run time. Be careful to not renice too far. Normally this is value “0” (neutral priority), and the following would be a very significant increase in priority (a nice of -5):

sudo nice --adjustment=-5 <your program name>

Something else you should consider is finding out where specific code is spending time, not just the program as a whole. See “man gprof” for information on how to compile with profiling flags (warning: this can really slow down a program, but relative times among code blocks should still be good).

Finally, make sure the system itself is not slowing down for some kind of energy savings. Depending on which L4T release you use, you will have either “~nvidia/jetson_clocks.sh”, or “/usr/bin/jetson_clocks”, and probably “/usr/sbin/nvpmodel”. Depending on release, setting to max performance before starting would go something like this (note that a TK1 won’t have nvpmodel, and the actual clocks command will be “~ubuntu/jetson_clocks.sh” or “~nvidia/jetson_clocks.sh” in most cases):

sudo nvpmodel -m 0
sudo jetson_clocks

If performance is maxed out such that sleep does not occur, then times in CPU may approach closer to actual run time.

Hi linuxdev,
Thanks for your reply, but I feel that I have not explained well my purpose. In fact, I want just to measure the time it takes to execute the whole program (for example the second one of frame difference posted above). Using the instructions of Clock-t does not give the accurate time spent in execution, I am now using a chronometer and I am searching an accurate method using instructions
when searching on web the clock-t seems to work perfectly and gives the accurate time but on Jetson TK1 is different?
I am missing something or not? why the clock-t does not seem to work here and is there an alternative to do that?
Thanks in advance

“clock()” is not a chronometer time. “Time in execution” has more than one meaning, and the “clock()” version is just a subset of chronometer time. A suspended program can have no clock time, and yet have a very long chronometer time…a system where the program runs continuously from start to stop on one CPU core may approximate chronometer time, whereas a program on a system which is heavily multitasking may have a much longer chronometer time than CPU time (a hybrid between fully suspended and running continuously from start to stop on a single core). You might check “man -S 2 gettimeofday” instead…this would tell you the time in the program, which is different than the time in CPU clock cycles. “gettimeofday()” will fill in a struct with the current time in seconds and microseconds. The difference of two such structs is the time span. Running one query at the start and then a second query at the end would be a chronometer time (rather than a CPU clock cycle time).

See also “man -a ctime”.

Some variations of time functions provide more or less resolution, e.g., a lot of functions only provide time with a one second resolution and have more formatting options. How fine of a resolution you want changes what you need to do to get the time.

A TK1 is no different than other computers when measuring time, but how clock cycles relate to real time is not accounted for with “clock()”. If you look at the “SEE ALSO” part of the “gettimeofday()” man page you’ll see some other time functions.

About man pages: If you use the “man” command to see “man pages”, then the same thing may have more than one section. “man -a gettimeofday” will cycle through all man pages with that title…as you quit one man page the next will show up. “man -S 2 gettimeofday” will specifically show man page section 2. Also, unless you tell a Jetson to install the man pages those pages won’t be present, but the Ubuntu host probably has those man pages.

Hi linuxdev,
If I understood well, the clock instruction returns the time spent if the program was running on single core, which is not the case of Jetson ARM CPU cores
gettimeofday() works fine with my code thanks a lot. I know this is not precise method for computing frames per second since it does not measure how much time needs for each pixel to be treated. however it gives a fair measurements.
Thanks a lot for your help