Why NLM2 the method in CUDA sampel “imageDenoisigng” works faster than NLM? As far as I understand there is a difference only that in NLM2 weights of the filter are calculated and writes to the shared memory, and in NLM such was not present. I can’t understand why NLM2 works faster than NLM (As far as I understand, NLM2 and NLM have identical algorithmic intensity). ??
In documentation it is written: “Quick NLM has one additional parameter – the block of pixels that share weights. This parameter is crucial in speeding NLM.” So, why "this parameter is crucial "?