jetson tx2 not using gpu for my the opencv caffe-model

hari.rathod · July 26, 2018, 11:33am

Hi all,

I installed opencv on my jetson tx2 according to jetsonhacks. And I wanted to see the fps that I get on the jetson tx2.
I am making use of the opencv dnn module and a online course titled, object detection using deep learning. Below is the modified code

# USAGE
'''
python deep_learning_with_opencv.py --image images/camel.jpg --prototxt VGG16.prototxt --model VGG.caffemodel --labels synset_words.txt

'''

# import the necessary packages
import numpy as np
import argparse
import cv2
import imutils
import time
from imutils.video import VideoStream
from imutils.video import FPS

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
                help="path to input image")
ap.add_argument("-p", "--prototxt", required=True,
                help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,
                help="path to Caffe pre-trained model")
ap.add_argument("-l", "--labels", required=True,
                help="path to ImageNet labels (i.e., syn-sets)")
args = vars(ap.parse_args())

# load the input image from disk
image = cv2.imread(args["image"])

# load the class labels from disk
rows = open(args["labels"]).read().strip().split("\n")
classes = [r[r.find(" ") + 1:].split(",")[0] for r in rows]

# our CNN requires fixed spatial dimensions for our input image(s)
# so we need to ensure it is resized to 224x224 pixels while
# performing mean subtraction (104, 117, 123) to normalize the input;
# after executing this command our "blob" now has the shape:
# (1, 3, 224, 224)
blob = cv2.dnn.blobFromImage(image, 3, (224, 224), (104, 117, 123))

# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

# set the blob as input to the network and perform a forward-pass to
# obtain our output classification
tic = time.time()
tic1 = time.time()
net.setInput(blob)
preds = net.forward()
toc = time.time()


# sort the indexes of the probabilities in descending order (higher
# probabilitiy first) and grab the top-5 predictions
idxs = np.argsort(preds[0])[::-1][:5]

# loop over the top-5 predictions and display them
for (i, idx) in enumerate(idxs):
    # draw the top prediction on the input image
    if i == 0:
        text = "Label: {}, {:.2f}%".format(classes[idx], preds[0][idx] * 100)
        cv2.putText(image, text, (5, 25), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

    # display the predicted label + associated probability to the
    # console
    print("[INFO] {}. label: {}, probability: {:.5}".format(i + 1, classes[idx], preds[0][idx]))

toc1 = time.time()
print("Inference time %.3f\n" %((toc - tic)))
print("Inference time %.3f\n" %((toc1 - tic1)))
t11= 1/(toc-tic)
print(t11)
# display the output image
cv2.imshow("Image", image)
cv2.waitKey(0)

Now when I tested this on my pc(i7) no dedticated graphic card, I got around 6fps, 0.15 inference time. But when I use the jetson TX2 , I get a pathetic 0.8 fps. Then I checked with the help of sudo ./tegrastats, and I saw that during me running the code, the gpu utilization is constant at GR3D_FREQ 21%@140(try to share screenshot)

Is there something I am doing wrong. How to get the opencv to use gpu. Or is it not even possible.
Any help will be greatly appreciated.

Regards,
Hari.

WayneWWW · July 27, 2018, 5:02am

Please run “sudo ./jetson_clock.sh”

hari.rathod · July 27, 2018, 5:49am

Hi,

I have run the jetson_clocks.sh prior to testing.
I have also changed the nvpmodel mode to 0, but still the gpu is not utilized.
Are there any changes required to the code that will enable the GPUs?

WayneWWW · July 27, 2018, 5:51am

GR3D_FREQ 21%@140 → Your GPU clock is still in 140. This is not the maximum at all.

Could you paste the full tegrastats log?

hari.rathod · July 27, 2018, 6:20am

We have tried running the object detection on TX2, where the usage was 99%@1300 for GR3D.

[url]https://github.com/dusty-nv/jetson-inference[/url]

While running the above code the GPU seems to be not used at all, as the freq does not increase like it should when we run the object detection.

WayneWWW · July 27, 2018, 6:43am

If jetson_clock is set correctly, the clock would be set to maximum even if no activity.

Thus, there are two problems here

Why jetson clock does not raise your gpu clock
Whether this is really a cpu loading app but not gpu one.

To understand what is exactly going on, I would like to see your tegrastas log when you running your app.

hari.rathod · July 27, 2018, 6:57am

Hi
I am sharing the tegrastats.log for the object detection code, where we were able to observe the rise in GPU clock freq.

I have started monitoring the GR3D freq before running the code. It starts at 140, then after running the code GPU clock reaches 1122Mhz (since i am running in mode 2)

nvidia@tegra-ubuntu:~$ sudo ./tegrastats
RAM 4518/7846MB (lfb 273x4MB) CPU [0%@806,0%@1342,0%@1346,0%@806,0%@806,0%@806] EMC_FREQ 2%@1866 GR3D_FREQ 59%@140 APE 150 MTS fg 0% bg 6% BCPU@34C MCPU@34C GPU@32.5C PLL@34C Tboard@30C Tdiode@30.5C PMIC@100C thermal@32.9C VDD_IN 4244/4244 VDD_CPU 917/917 VDD_GPU 153/153 VDD_SOC 917/917 VDD_WIFI 0/0 VDD_DDR 1380/1380
RAM 4632/7846MB (lfb 264x4MB) CPU [5%@345,72%@1393,21%@1390,8%@345,11%@345,7%@345] EMC_FREQ 2%@1866 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 4% BCPU@34C MCPU@34C GPU@32.5C PLL@34C Tboard@30C Tdiode@30.75C PMIC@100C thermal@33.4C VDD_IN 4204/4224 VDD_CPU 917/917 VDD_GPU 153/153 VDD_SOC 917/917 VDD_WIFI 0/0 VDD_DDR 1363/1371
RAM 4649/7846MB (lfb 262x4MB) CPU [21%@1113,11%@345,38%@345,30%@1114,32%@1113,45%@1113] EMC_FREQ 2%@1866 GR3D_FREQ 3%@140 APE 150 MTS fg 0% bg 3% BCPU@34C MCPU@34C GPU@32.5C PLL@34C Tboard@30C Tdiode@30.75C PMIC@100C thermal@33.4C VDD_IN 4701/4383 VDD_CPU 611/815 VDD_GPU 229/178 VDD_SOC 1223/1019 VDD_WIFI 0/0 VDD_DDR 1418/1387
RAM 4710/7846MB (lfb 250x4MB) CPU [41%@652,8%@499,7%@499,40%@652,49%@652,50%@652] EMC_FREQ 10%@1866 GR3D_FREQ 90%@1122 APE 150 MTS fg 0% bg 1% BCPU@34C MCPU@34C GPU@36C PLL@34C Tboard@30C Tdiode@33C PMIC@100C thermal@33.6C VDD_IN 8402/5387 VDD_CPU 458/725 VDD_GPU 2598/783 VDD_SOC 1680/1184 VDD_WIFI 0/0 VDD_DDR 2089/1562
RAM 4710/7846MB (lfb 250x4MB) CPU [50%@345,0%@345,0%@345,46%@345,43%@345,52%@345] EMC_FREQ 18%@1866 GR3D_FREQ 15%@1122 APE 150 MTS fg 0% bg 0% BCPU@34.5C MCPU@34.5C GPU@34.5C PLL@34.5C Tboard@30C Tdiode@34.5C PMIC@100C thermal@34C VDD_IN 9429/6196 VDD_CPU 381/656 VDD_GPU 3513/1329 VDD_SOC 1832/1313 VDD_WIFI 0/0 VDD_DDR 2261/1702
RAM 4710/7846MB (lfb 250x4MB) CPU [45%@345,0%@345,0%@345,39%@345,46%@345,40%@345] EMC_FREQ 19%@1866 GR3D_FREQ 41%@1032 APE 150 MTS fg 0% bg 0% BCPU@34.5C MCPU@34.5C GPU@34.5C PLL@34.5C Tboard@30C Tdiode@34C PMIC@100C thermal@34.5C VDD_IN 8742/6620 VDD_CPU 305/598 VDD_GPU 3285/1655 VDD_SOC 1832/1400 VDD_WIFI 0/0 VDD_DDR 2185/1782
RAM 4710/7846MB (lfb 250x4MB) CPU [44%@499,0%@345,0%@345,35%@498,40%@498,43%@498] EMC_FREQ 20%@1866 GR3D_FREQ 91%@1032 APE 150 MTS fg 0% bg 0% BCPU@34.5C MCPU@34.5C GPU@36.5C PLL@34.5C Tboard@30C Tdiode@34.5C PMIC@100C thermal@34.9C VDD_IN 9009/6961 VDD_CPU 305/556 VDD_GPU 3515/1920 VDD_SOC 1832/1461 VDD_WIFI 0/0 VDD_DDR 2261/1851
RAM 4710/7846MB (lfb 250x4MB) CPU [42%@498,0%@345,0%@345,42%@499,49%@499,52%@499] EMC_FREQ 21%@1866 GR3D_FREQ 15%@1122 APE 150 MTS fg 0% bg 0% BCPU@34.5C MCPU@34.5C GPU@36C PLL@34.5C Tboard@30C Tdiode@35.25C PMIC@100C thermal@35.3C VDD_IN 8742/7184 VDD_CPU 381/534 VDD_GPU 3285/2091 VDD_SOC 1832/1508 VDD_WIFI 0/0 VDD_DDR 2204/1895
RAM 4710/7846MB (lfb 250x4MB) CPU [46%@499,0%@345,0%@345,44%@499,49%@499,43%@345] EMC_FREQ 22%@1866 GR3D_FREQ 7%@1122 APE 150 MTS fg 0% bg 0% BCPU@34.5C MCPU@34.5C GPU@36.5C PLL@34.5C Tboard@31C Tdiode@35.5C PMIC@100C thermal@35.2C VDD_IN 9124/7399 VDD_CPU 381/517 VDD_GPU 3515/2249 VDD_SOC 1832/1544 VDD_WIFI 0/0 VDD_DDR 2261/1935
RAM 4710/7846MB (lfb 250x4MB) CPU [39%@345,0%@345,0%@345,40%@345,52%@345,47%@345] EMC_FREQ 21%@1866 GR3D_FREQ 54%@1122 APE 150 MTS fg 0% bg 0% BCPU@35C MCPU@35C GPU@36.5C PLL@35C Tboard@31C Tdiode@35C PMIC@100C thermal@35.8C VDD_IN 9047/7564 VDD_CPU 381/503 VDD_GPU 3591/2383 VDD_SOC 1832/1572 VDD_WIFI 0/0 VDD_DDR 2242/1966
RAM 4710/7846MB (lfb 250x4MB) CPU [50%@652,0%@345,0%@345,41%@652,44%@652,48%@653] EMC_FREQ 22%@1866 GR3D_FREQ 17%@1122 APE 150 MTS fg 0% bg 0% BCPU@35C MCPU@35C GPU@36.5C PLL@35C Tboard@31C Tdiode@34.5C PMIC@100C thermal@35.8C VDD_IN 9085/7702 VDD_CPU 381/492 VDD_GPU 3515/2486 VDD_SOC 1832/1596 VDD_WIFI 0/0 VDD_DDR 2242/1991
RAM 4710/7846MB (lfb 250x4MB) CPU [43%@498,0%@345,0%@345,33%@499,48%@499,46%@499] EMC_FREQ 22%@1866 GR3D_FREQ 75%@1122 APE 150 MTS fg 0% bg 0% BCPU@35C MCPU@35C GPU@36.5C PLL@35C Tboard@31C Tdiode@35.25C PMIC@100C thermal@35.4C VDD_IN 9238/7830 VDD_CPU 381/483 VDD_GPU 3667/2584 VDD_SOC 1832/1616 VDD_WIFI 0/0 VDD_DDR 2261/2013

red8341h · October 18, 2019, 12:40am

I am facing the same problem if you solved this issue please, let me know how to solve it??

Topic		Replies	Views
does opencv_dnn use gpu? Jetson TX2	11	3097	October 18, 2021
Is OpenCV really using the GPU for detection? Jetson Nano opencv , cuda , jetson-inference	11	8405	October 15, 2021
No difference between CPU and Jetson nano Jetson Nano	23	4552	October 14, 2021
Slow performance with opencv at jetson tx2 Jetson TX2	13	3895	October 18, 2021
GPU does not work when running SSD Jetson TX2	26	2663	August 15, 2017
Object Detection Performance Jetson Tx2 slower than expected Jetson TX2	22	14686	October 18, 2021
Opencv Face Detection Poor Performance with jetson nano Jetson Nano opencv	51	14203	October 14, 2021
How to use full CPU & GPU Potential of Jetson AGX Xavier Jetson AGX Xavier	5	1787	October 18, 2021
Performance issue on opencv TX1 Jetson TX1	8	813	October 18, 2021
Overheating issue? Jetson TX1	19	3026	May 28, 2016

jetson tx2 not using gpu for my the opencv caffe-model

Related topics