Can KLT_Tracker work on colored images?

Hi everyone,

I am currently using the KLT_Tracker algorithm from the VPI software library, but this algorithm needs to convert each frame into a grayscale image in order to track objects. Is there another sample that will do the job on colored image or i have to adapt the sample ?


VPI KLT tracker is an implementation of below’s paper:

Usually, KLT tracker works well on the grayscale image.

May I know your use case?
Do you have the objects similar in shape but only different in color?


Hi AastaLLL,

Usually, KLT tracker works well on the grayscale image.

It’s working fine for the tests that i’ve done, except the fact that it lose the object when the bounding box is too big ( > 64*64) as mentioned Here, it shouldn’t be a problem for me.

May I know your use case?

I just need to track some far object and I was wondering if there is some tracking sample that was working like the Nvidia sample “nvx_sample_object_tracker” but in VPI (it keeps the color format in RGB).

Do you have the objects similar in shape but only different in color?

Not for now, but it’s likely that i will encounter this issue. In this case i will probably add an IA on top of the recognition phase or change the color space of the input image (LAB, etc…) if i need it. Is there a better solution ?

Hi @Bazziil , in my topic mentioned by you, from what you have read they are working to solve this problem and will release an update, as I think it is a bit of a stretch to use this tracker.
I can recommend for the moment to use the tracking version of visionworks with cuda and the same as the VPI algorithm, only the library changes…

CUDA Layer Object Tracker Sample App

you can find everything in the VisionWorks documentation and the examples on the board.

Hi @saromano ,
I have tried the sample “nvm_sample_object_tracker_nvxcu” in VisionWorks 1.6. Are you talking about this one ?
As i said, the bounding box size shouldn’t be a problem for me, but thanks for your recommendation, this will be a backup solution for me.


Although it need to convert the image into grayscale for the KLT tracker.
You can still use the original color image for other processing like warping.

For Small size bbox KLT Feature Tracker, do you have a sequence for the big bbox use case?


Hi AastaLLL,

Although it need to convert the image into grayscale for the KLT tracker.

I know that, the saved frame will be grayscale unless i modify the code and add the bounding box onto the initial image.

For Small size bbox KLT Feature Tracker do you have a sequence for the big bbox use case?

Not yet, but it will probably not be an issue for me. I just noticed that big bbox doesn’t work when i was testing the algorithm (with some random video).


Would you mind testing the large bbox use case with VPI v1.0, which is included in the JetPack4.5?
Suppose the size limitation is removed in the v1.0 release.


Hi AastaLLL,

I have the same result as before, can confirm that when my bbox is 30 units wide and 90 units high, it instantly get lost.
When it’s 30 units wide and 62 units high it works until the height is too big.

What i did to test it :

  • Update my Jetpack version with SDKManager to 4.5
  • Copy the files in /opt/nvidia/vpi1/samples/06-klt_tracker to ~/Documents
  • Did cmake . and make , and use the executables with the correct arguments.


Do you have test data so we can check this with our internal team?

Hi AastaLLL, Sorry for late reply,
I tested the klt_tracker algorithm while printing the bbox size on your video test “dashcam.mp4” with the text descriptor containing only “427 562 342 14 16”. Here is the result of bbox descriptor :

Frame number : 464
x : 438.488
y : 353.192
w : 53.0545
h : 60.6337
465 → update 1
Frame number : 465
x : 384.424
y : 366.494
w : 78.185
h : 88.5114
466 → dropped 1

While calculing coordinates as follow :

std::cout << " x : " << bboxes[b].bbox.xform.mat3[0][2] << std::endl;
std::cout << " y : " << bboxes[b].bbox.xform.mat3[1][2] << std::endl;
std::cout << " w : " << bboxes[b].bbox.width * bboxes[b].bbox.xform.mat3[0][0] << std::endl;
std::cout << " h : " << bboxes[b].bbox.height * bboxes[b].bbox.xform.mat3[1][1]<< std::endl;

But this algorithm, unlike the Visionworks object_tracker_nvxcu, return the grayscale image which is not what i want.
I tried to copy the original image and to add bboxes on it, but I have not yet succeeded. My camera is returning images on VPI_IMAGE_FORMAT_BGR8 format. The fonction SaveKLTBoxes() doesn’t support this format.

For now i’m using VisionWorks algorithms because it offers much more possibility. I will learn more details about vpi tonight.

Was it a good example or do you need another one ?

BTW, are you still working on upgrading the BBox tracking size ?

I heard that there soon will be a VPI Devblog, when will it be published ?


We are checking the bbox size issue internally.
Will update more once we got any feedback.

To draw the bounding box on a color image.
You can try the OpenCV library.

Below is a sample to use VPI with OpenCV.
You can change the into cv.rectanglefor bounding box.



Can you run the algorithm 3x separately, once for each colour channel?

If a box is lost in 1 channel, you can keep checking for the bounding box on other channels and maybe rediscover the bounding box on the lost channel at a later point, if that makes sense.


This is a possible way to do this.

But you may need to handle the inconsistency between different channels.
For example, a box is detected in the red channel but missed in the green channel.
Then it will be challenging to tell the false alarm or a color-specific detection.

In general, we recommend users do this on luminance data only.