Im new to CUDA, is this ‘nvivafilter’ will execute the linked code as in case with simple CUDA kernels? Or is there any limitation I should take into account? Based on the example: nvsample_cudaprocess.cu it seems to be.
Also can you point me to a documentation / example where I can see how to pass data between the CPU and GPU? Our use case would be to have a simple C++ app which would receive the events and based on them it should be some updates on the video overlay. So is there any ‘gstreamer’-ish async event based communication or should I use cudaMallocManaged to allocate variables in the shared memory?
Using this approach we now implemented a much more performant solution. Thank you for the suggestion!
But as I’m pretty new to GPU programming the solution itself is a naive solution and Im confident there are lot of room for further improvements.
What Im not sure is how can I profile or debug the nvivafilter. I looked around and the as I see there are very nice tools for such task for standalone CUDA application, nvprof and cuda-memcheck is popping up most of the time. However can I use this with GStreamer and if yes how? Or is there any other alternative to gain performance insights from the nvivafilter?
when compile your nvsample_cudaprocess.cu code with make command, customer_functions.h is not recognized.
Can you help me to find this header file?
fatal error: customer_functions.h: No such file or directory