Cache coherence problem when using cuda functions on Xavier

Hi All,

I write a few functions in cuda with stream as explained in page 11 in the link below:

My cuda function takes yuv422 and splits it to separate buffers for Y U and V, resizes Y, than i use NPP functions for remap, after coping the data to host and using cv::imshow for display i see problems that seems to be caused by cache coherence problem.

The Xavier have cuda 10.0 with NPP that doesn’t support npp functions with Ctx for example nppiRemap_8u_C1R_Ctx as in cuda 10.2 see link below:

When i use rgb input with npp functions only, i don’t see a cache coherence problem.

Can anyone please advise?



We want to reproduce this issue in our environment before giving further suggestion.
Is it possible to provide a simple reproducible source for us debugging?