I write a few functions in cuda with stream as explained in page 11 in the link below:
My cuda function takes yuv422 and splits it to separate buffers for Y U and V, resizes Y, than i use NPP functions for remap, after coping the data to host and using cv::imshow for display i see problems that seems to be caused by cache coherence problem.
The Xavier have cuda 10.0 with NPP that doesn’t support npp functions with Ctx for example nppiRemap_8u_C1R_Ctx as in cuda 10.2 see link below:
When i use rgb input with npp functions only, i don’t see a cache coherence problem.
Can anyone please advise?