CPU timing on Jetson TK1

KamalLAGH · December 4, 2019, 9:10pm

Hi all,
I am trying to compute the processing time of CPU implementation to compare it with GPU implementation
I started by using Clock_t instruction available from the time library
I used the instruction system sleep to create a pause of 10 seconds
However what I obtain as result is less than 4 ms
here is my program,

#include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    #include <math.h>

    #include <time.h>
int main ()
{
clock_t start, stop;
double elapsed;

start = clock();
system("sleep 10");

stop = clock();
elapsed = ((double)stop - start) / CLOCKS_PER_SEC;
FILE *f = fopen("result.txt", "w");
fprintf(f," %.0f \n %.0f \n %.0f \n %.2f \n %f \n %f",(float)start,(float)stop, (float)(stop - start),(float)CLOCKS_PER_SEC, (float)(stop - start)/CLOCKS_PER_SEC,elapsed);

return EXIT_SUCCESS;
}

the whole program is the simple frame difference, when I use the chronometer I obtain 30 s for full HD video, but the clock_t instruction gives me approximately the half

#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include "opencv2/opencv.hpp"


#include <iostream>
#include <sstream>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>



//using namespace cv;
using namespace std;

int main()
{
clock_t cpu_startTime, cpu_endTime;
float fps;        // FPS average
unsigned int frameCount = 0;
double cpu_ElapseTime=0;

cpu_startTime = clock();
	
cv::VideoCapture input("/home/ubuntu/MyExamples/MD0/cam2Wildtrack.MP4");
cv::Mat img, img_prev0, img_prev, frame, frameDelta, thresh, gray_img;


input.read(img);
img.copyTo(img_prev0);
cv::cvtColor(img_prev0, img_prev, CV_BGR2GRAY);
cv::GaussianBlur(img_prev, img_prev, cv::Size(9, 9), 0);

	while(input.read(frame)) 
		{
                frameCount++;
		cv::cvtColor(frame, gray_img, CV_BGR2GRAY);
		cv::GaussianBlur(gray_img, gray_img, cv::Size(9, 9), 0);
		cv::absdiff(img_prev, gray_img, frameDelta);
                gray_img.copyTo(img_prev);
		cv::threshold(frameDelta, thresh, 25, 255, cv::THRESH_BINARY);

		cv::imshow("Camera", thresh);

		if (frameCount==1999) 
		{
                cpu_endTime = clock();
		cpu_ElapseTime = ((cpu_endTime - cpu_startTime)/(float)CLOCKS_PER_SEC);
	        fps=float(frameCount)/(cpu_ElapseTime);
		FILE *f = fopen("result.txt", "w");
		fprintf(f, "frame per second: %3.2f fps \n elapsed time: %3.2f s", fps, cpu_ElapseTime);
			  break;
		}
		char c=cv::waitKey(1);
		if(c == 27)
			{
			//exit if ESC is pressed
			break;
			}
		}
}

Any help is very appreciated
Thanks in advance
Kamal

linuxdev · December 5, 2019, 5:23pm

I haven’t tried this out, but if you are measuring clock time of CPU, not actual time, then the result will be of time in CPU core, excluding time GPU is running without using the CPU. Are you really interested in knowing the time in the CPU core? You might want to consider using other measures of time if it is actual time you are interested in.

KamalLAGH · December 6, 2019, 5:10am

Hi linuxdev,
Yes I want to measure the time in CPU core. In his code and the above one I have not used any cuda code, so the whole code will be executed on CPU.

linuxdev · December 6, 2019, 3:42pm

Am I correct that you are asking about how the actual time spent is around double the time spent on CPU? If so, then things are working as they should.

Consider that the system multitasks, and that just because the program runs, the program does not lock the CPU to itself. You could in fact rewrite this or set this up such that it forces any CPU other than CPU0 to work entirely on your program. The goal would be to exclude other processes from using that core.

Even so, there would still be times your process blocks and stops using the CPU. For example, memory access through the memory controller is a shared resource with other parts of the operating system. Similar can be said about the eMMC.

One thing you might want to experiment with is the “renice” the process to have a higher priority. I am only guessing, but as your process priority goes up (as the “nice” becomes more negative) odds are that the actual CPU time would more closely approach total run time. Be careful to not renice too far. Normally this is value “0” (neutral priority), and the following would be a very significant increase in priority (a nice of -5):

sudo nice --adjustment=-5 <your program name>

Something else you should consider is finding out where specific code is spending time, not just the program as a whole. See “man gprof” for information on how to compile with profiling flags (warning: this can really slow down a program, but relative times among code blocks should still be good).

Finally, make sure the system itself is not slowing down for some kind of energy savings. Depending on which L4T release you use, you will have either “~nvidia/jetson_clocks.sh”, or “/usr/bin/jetson_clocks”, and probably “/usr/sbin/nvpmodel”. Depending on release, setting to max performance before starting would go something like this (note that a TK1 won’t have nvpmodel, and the actual clocks command will be “~ubuntu/jetson_clocks.sh” or “~nvidia/jetson_clocks.sh” in most cases):

sudo nvpmodel -m 0
sudo jetson_clocks

If performance is maxed out such that sleep does not occur, then times in CPU may approach closer to actual run time.

KamalLAGH · December 7, 2019, 6:52am

Hi linuxdev,
Thanks for your reply, but I feel that I have not explained well my purpose. In fact, I want just to measure the time it takes to execute the whole program (for example the second one of frame difference posted above). Using the instructions of Clock-t does not give the accurate time spent in execution, I am now using a chronometer and I am searching an accurate method using instructions
when searching on web the clock-t seems to work perfectly and gives the accurate time but on Jetson TK1 is different?
I am missing something or not? why the clock-t does not seem to work here and is there an alternative to do that?
Thanks in advance

linuxdev · December 7, 2019, 9:00pm

“clock()” is not a chronometer time. “Time in execution” has more than one meaning, and the “clock()” version is just a subset of chronometer time. A suspended program can have no clock time, and yet have a very long chronometer time…a system where the program runs continuously from start to stop on one CPU core may approximate chronometer time, whereas a program on a system which is heavily multitasking may have a much longer chronometer time than CPU time (a hybrid between fully suspended and running continuously from start to stop on a single core). You might check “man -S 2 gettimeofday” instead…this would tell you the time in the program, which is different than the time in CPU clock cycles. “gettimeofday()” will fill in a struct with the current time in seconds and microseconds. The difference of two such structs is the time span. Running one query at the start and then a second query at the end would be a chronometer time (rather than a CPU clock cycle time).

Topic		Replies	Views
Opencv4Tegra GPU vs CPU TK1 vs TX1 Jetson TX1 opencv	3	3671	April 28, 2016
Anyone know how to monitor the GPU MHz? Jetson TK1	15	31808	October 27, 2014
[Jetson-TK1] RAM clock CPU-GPU Hybrid Processing Slow Jetson TK1	3	1996	February 13, 2015
TK1 very slow GPU initialization Jetson TK1	12	1409	October 18, 2021
CPU performance problem on Jetson TX1 Jetson TX1	23	3211	October 18, 2021
Jetson TK1 - Changing the clocks while running Jetson TK1	1	1086	April 10, 2015
High frequency external clock and data GPIO sampling Jetson TK1	4	1075	October 18, 2021
Issue of computing speed are slow at first and then accelerate Jetson AGX Orin jetson-inference	7	724	April 12, 2023
JETSON takes triple time than X86 (i3) Jetson TX1	3	1271	October 18, 2021
Why kernel calculate speed got slower after waiting for a while? CUDA Programming and Performance cuda	9	1763	July 19, 2022

CPU timing on Jetson TK1

Related topics