Contexts and streams with multiple threads with TensorRT, NPP and maybe NVENC

I think I’ve got into a bit of a pickle managing contexts and streams across TensorRT and NPP (and I might want to add NVENC to that).

I have multiple feeds that are being pre-processed in one thread with NPP, then over to another thread that does more nppi stuff and then a TensorRT infer, and then either cuda to host memory or onto another thread with another TensorRT engine with more nppi stuff (I know, its all over the place but there we go).

I’m confused by the different uses of the word context. In TensorRT you have IExecutionContext, and in NPP you have NppStreamContext but I’m guessing that’s quite different and NppStreamContext is just a way of holding onto the stream info? At the moment I’m using NppSetStream but I’ve run into a crash. I think this may be because I’m preprocessing using the default stream but into a cuda buffer allocated under a different stream (or context?). It sometimes works and sometimes crashes so I’m getting confused.

And I see to populate the NppStreamContext structure I should call functions like cudaGetDeviceProperties. Should I do this after getting the execution context created by tensorrt and the associated stream created? And then just make sure that memory interactions are happening on the same stream for TensorRT and NPP? I’m getting confused as to what goes where.

Thanks, Joe

So I made sure the same Cuda context was being used by all threads, simplified my thread usage in general, and got on top of managing my streams and supplying the NppStreamContext to those functions.

I also found a bug being caused by a nppi resize function. I misinterpreted the docs, I thought out of source range points were ignored by it looks like they aren’t and trash memory if not prevented.