NvDCF visual similarity

Could you please provide some more details how visual similarity in the NvDCF tracker is calculated and how it works? The only description I could find is

The visual similarity is computed based on the correlation response of the tracker at the detector bbox location

It’s related to DCF core algo, simply you can think it’s the possibility how the object like the previous object,
For more details, you can refer the paper related to DCF.

Could you please link me to the paper related to DCF as I am not sure which document you are referring to that would answer my question?