Cache coherence problem when using cuda functions on Xavier

Hi All,

I write a few functions in cuda with stream as explained in page 11 in the link below:

https://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf

My cuda function takes yuv422 and splits it to separate buffers for Y U and V, resizes Y, than i use NPP functions for remap, after coping the data to host and using cv::imshow for display i see problems that seems to be caused by cache coherence problem.

The Xavier have cuda 10.0 with NPP that doesn’t support npp functions with Ctx for example nppiRemap_8u_C1R_Ctx as in cuda 10.2 see link below:

https://docs.nvidia.com/cuda/npp/group__image__remap.html

When i use rgb input with npp functions only, i don’t see a cache coherence problem.

Can anyone please advise?

Thanks,
Gabi

Hi,

We want to reproduce this issue in our environment before giving further suggestion.
Is it possible to provide a simple reproducible source for us debugging?

Thanks.