I have integrated Jetson-inference in Drive PX2 (jetson-inference Github: https://github.com/dusty-nv/jetson-inference). I have modified a sample for detection multiple objects using jetson-inference library.
I call a function detect which is defined in DetectNet.cpp. Inside Detect function ‘execute’ is used(found in Line 513 on, https://github.com/dusty-nv/jetson-inference/blob/master/detectNet.cpp)
I used std::chrono::high_resolution_clock to measure the time taken for ‘execute’ command to run, which in returns gives me the time taken my network to detect the object in a picture.
My initial results on the host PC with FP32 accuracy were in between 50-60 ms per image.
After few weeks, I again ran the time measurement on the same ‘execute’ command, I saw the detection takes in between 150-170 ms, which is 4 times slower than the original measurement on my host PC.
After some digging, I found ‘execute’ command is called from NvInfer.h. I used a caffe model and prototxt file to create a TensorRT Fp32 engine. The detectNet::create makes an engine using tensorNet.cpp, which is provided in the gihub.
I came across ‘nvprof’ but it gives me error on execution.
I cant figure out how the detection became 4 time slower.
Any help would be appreciated.