24fps real-time video player w/ kernel that takes longer than 1/24s?

foo · October 20, 2008, 5:20pm

Hi guys, I need some advice. Here is what I want to do:

I am implementing a video player that decodes images on the GPU and displays them with OpenGL. Since it gives me better performance I am always decoding multiple frames at once in a single kernel execution (say GOP size of 8). This kernel execution takes significantly longer than 1/24sec but decodes 8 frames in less than 8/24 sec, so in theory real time @ 24fps should be feasible.

Now, when I start a sequence, the player suffers from quite regular hick-ups, i.e. it is stuttering. A couple of frames are played in real-time and then the player pauses for a moment, and so on.

Is this because the OpenGL drawing functions have to wait for the really long decoding kernel execution? Or can OpenGL functions and CUDA kernels run concurrently? Btw, I am running OpenGL and CUDA in two different host threads and have the Contexts share the PBOs that are used to copy the decoded frames from a CUDA buffer to a OpenGL texture.

Hope the problem is clear. Any ideas?

Ailleur · October 20, 2008, 6:23pm

Not sure if this is exactly what youre doing, but have a look at this:
[url=“http://forums.nvidia.com/index.php?showtopic=60756&st=0&p=340731&#entry340731”]http://forums.nvidia.com/index.php?showtop...31&#entry340731[/url]

SPWorley · October 21, 2008, 12:01am

If you’re using the same card for both CUDA and display, you’ll run into problems. The GPU cannot run two kernels simultaneously… and OpenGL is effectively a kernel. So you need to make sure that your kernels are significantly shorter than 1/24 of a second in order to give the display enough time slices.

This doesn’t preclude you from doing your buffered 8-at-once strategy. Just structure your compute to use several sequential shorter kernel calls. That can be annoying but perhaps not too bad depending on your algorithm.
Don’t be too concerned with the overhead of even dozens of kernel launches, it’s quite minor in practice.

ColinS · October 21, 2008, 12:47am

If it’s possible, you might want to consider looking at the problem from a different angle. Instead of computing eight frames at once, would it be possible to split up each frame into 16 (or more) sub-frames and compute those sub-frames in parallel, and calculate the frames sequentially? This approach works well for some problems, and not for others, but you may want to think about this approach. This would allow you to calculate a frame, call OpenGL, calculate the next frame, call OpenGL, and so on.

foo · October 21, 2008, 9:52am

@Ailleur: Yes, I do have to use PBOs to make my decoded frames available to OpenGL. I am registering them to CUDA only once in the beginning and only mapping and unmapping them as each frame is decoded. But no, I cannot confirm that these are time consuming operations. They execute <1ms. Possibly this issue was fixed with CUDA 2.0.

@SPWorley: This is exactly what I was afraid of. I wish the architecture was designed in a way that multiple kernels can run at once, but there are probably many good reasons that they can’t. I had the same work-around, you suggested, in mind and will now tackle this.

@ ColinS: Unfortunately I cannot reveal exactly what kind of codec I am working on, but your suggestion does not apply to my problem. Thanks anyways.

foo · October 22, 2008, 1:32pm

I have successfully seperated my big fat kernel function into a bunch of shorter ones. But now, when I playback a video at 24fps, after 3 oder 4 seconds, the display turns gray and I have to restart my PC. The only advance warning I get is that the GPU fan noise lvel rises significantly, but I guess that’s normal when I keep the GPU so busy.

Has anyone experienced a similar problem? Does the Nvidia driver create a crash log of some sort?

alex_dubinsky · October 23, 2008, 1:37am

The GPU fan noise rises when the GPU is heating up. Is the inside of your case well ventilated? Buy some case fans.

foo · October 23, 2008, 7:09am

According to PC Wizard 2008, the GPU temperature rises from its normal temperature of 65Â°C to 86Â°C before the PC crashes with a grey screen. Ambient temperature rises from 51Â°C to 55Â°C.
And anyways, there is plenty of space in my pc case as the GTX-280 is my only device so far. Temperature shouldn’t be an issue … I hope.

alex_dubinsky · October 23, 2008, 5:02pm

86 ain’t low. And how do you know it’s the GPU that crashes? It could be anything inside the case, such as the PSU that has to cope with hotter exhaust temp and greater load. Ambient 55C is quite hot.

foo · October 24, 2008, 9:10am

okay, so far it looks like it was indeed the high temperature that caused the crashes. When I manually set the fan to 100% everything works fine.

Im gonna have to get used to wearing headphones in my office, though. That fan is noisy!

alex_dubinsky · October 24, 2008, 4:34pm

Dude. Do as I say. Buy more case fans to reduce your ambient. Those things are much quieter than the gpu fan anyway.

Topic		Replies	Views
Performance Issue: Use both CUDA for Deep Learning and OpenGL for Rendering CUDA Programming and Performance	12	1742	June 22, 2019
cudaDecodeGL, jerky, choppy, hesitation, execution. Pauses briefly every few frames. CUDA Programming and Performance	1	649	March 6, 2017
Encoding + displaying not keeping-up (NVEnc, OpenGL, CUDA) Video Processing & Optical Flow	1	1976	March 15, 2018
Newbie XP & hardware questions CUDA Programming and Performance	8	10834	June 14, 2008
Low fps when call CUDA kernel in DirectShow filter CUDA Programming and Performance	0	5926	June 15, 2010
CUDA Screen freeze with 1 graphics Card CUDA Programming and Performance	37	52272	June 17, 2011
running kernel and sreen display CUDA Programming and Performance	1	3556	April 26, 2010
How to solve low fps when I load multiple videos(6 videos, 2048x2048 192MB per video)? Video Processing & Optical Flow	3	833	May 13, 2018
On-board graphics CUDA Programming and Performance	3	4394	April 22, 2010
GPU Processing takes longer when run less frequent Jetson Nano gpu-computing	4	173	July 17, 2024

24fps real-time video player w/ kernel that takes longer than 1/24s?

Related topics