I’m a Lightworks (NLE) user. I’m using a Linux Manjaro and 384.90 drivers. I’ve bought a mobile workstation to better organise my work and I’m dissappointed with general Nvidia GPU performance.
There is a problem with sleep&wakeup. After sleep the GPU stucks at low clock rate which affects overall performance - Powermizer switches to performance level 2 but clock is stuck at ~135-235MHz. After reboot - things getting back to a normal state.
The second thing is that there is a performance issue. We have a test project, a kind of benchmark project, which shows us how Lightworks works on different configurations and OS-es. For Linux, NVidia GPUs have always worse performance than ATI/AMD. Only Windows+Nvidia is really good. If you’re curious, here are two newest forum threads:
Export time benchmarks: https://www.lwks.com/index.php?option=com_kunena&func=view&catid=217&id=156673&Itemid=81#157781
General GPU performance tests: https://www.lwks.com/index.php?option=com_kunena&func=view&catid=217&id=150643&Itemid=81 (long story)
Please note that these benchmarks aren’t scientific, but shows how things work in a real usage.
Quick performance comparison
These results are from measuring export time of Lightworks test project, run on Manjaro Linux:
As you can see Nvidia GPU performance in such use case is very poor. We cannot determine why it is so slow. We only know that AMD/Radeon and Windows+Nvidia works a way better than Linux+Nvidia.
Is there anything we can do to improve the performance on Linux? Why it is so slow? Why ATI/AMD (opensource drivers) and older cards are a way better (for our case) than powerful Nvidia chips?
I see that nvidia-drm is not loaded in /proc/modules. This may be due to bumblebee setup (I’m using intel gpu for common tasks and run Lightworks by optirun/primusrun). Maybe that’s why your hint does not work for me.
First I was used bubmlebee and the issue was exists. I didn’t know about bug you’ve mentioned.
Then I reconfigured setup manually to try PRIME. After that I got back to bumblebee.
I’ll remove everything, reinstall and try again with your hint.
In case of slow performance with Lightworks, it is related to rendering (export). And this is probably something more exotic. Until sleep/wakeup issue occurs, the GPU works pretty good (nvidia-smi is reporting max clock values) - the playback with complex GPU effects is far better than using Intel GPU. However rendering (encoding as a final mp4 file) is slower than other systems (1m:28sec; Windows: 32sec, radeon on an old&cheap laptop: 1m10sec).
So summing up - Export (render) time on Linux is 3xslower than for Windows on the same machine. We know from Lightworks devs that there are differences between Linux (OpenGL) and Windows (Direct3d) ports, and Linux will be always slower due to implementation. And our test results confirms that. But 3x is too big, because it looks that AMD drivers works better in this particular case and the difference between windows and linux is smaller for ATI/AMD. I’m wondering why…
I will try to apply your hint, of course, and check again. Thank you for your support.
Sorry to interrupt, but I run an i7-3770 with a GTX960 and I have really bad stuttering issues with 384.90. The previous version of 384 was doing a great job.
Nothing shows on the fps counter of in games benchmarks (Dirt Showdown, Dirt Rally) but it feels like frames ordering is sometimes messed up.
Tried Composition Pipeline on/off. Tried different UI and flavors (KDE, Unity, Gnome, Ubuntu, Kubuntu, KDE-Neon), but nothing fixes it. With 381.22 everything is as expected.
Thanks.
Edit: Maybe I shouldn’t have posted in this thread, but was looking for something related to 384.90. Sorry I realize I should have created a new thread.
Lightworks uses OpenCL for effects processing, so my problem may be related to OpenCL? Is there somebody who can help me find the cause? I’m not only one - there are plenty Linux+Nvidia+Lightworks users, who are complaining about the performance. Or maybe is it related to missing PRIME offloading?
As I understand it, Lightworks uses pixel shaders i.e. DirectX on Windows and OpenGL on Linux/macOS for effects. De-/encoding is done on CPU. So it’s rather a question of bad OpenGL optimization/use on the Lightworks side.
e.g. the host-to-gpu test is basically how many 1080p frames can be copied to gpu memory per second. Not that many looking at the numbers.
PS: PRIME offload aka render offloading has nothing to do with performance but rather power saving/convenience.
GPU-Test: [url]Lightworks - Easy to Use Pro Video Editing Software
We know from Lightworks devs that Linux port has some tricky parts due to multithreading issues, and they confirmed worse performance (comparing to Windows). But please look at my laptop’s results:
Precision is most powerful, Vostro is a mid, Inspiron is most weak,old and cheap. As you can see, the most weak is the best in this test. It has Radeon HD 8730M installed, a GPU about 2-3 times slower than Quadro.
I can’t understand such differences. I’m just expecting a better result for Precision@Linux, at about 50 seconds.
I agree that this is hard to understand but it is also hard to tell the reason.
When exporting a project, it boils down to
decode (cpu)->copy(GL?)->generate/apply effects(GL)->copy(GL?)->encode(cpu)
Question is, where is the missing time spent? Maybe LW is using GL commands that are better optimized in Mesa, maybe the de-/encoding is gpu assisted on Mesa. Only LW devs can tell that.
What driver was used with the AMD? fglrx, Mesa?
Some random thoughts on the numbers:
940MX vs. HD 8730M makes sense, the latter has twice the memory transfer rate.
The Dell Precision numbers are completely off, even for the Intel compared to the Vostro. I don’t know what codecs you use for export, if to some raw format this might be limited by disk transfer, if encoded it should be faster due to more cores unless the codec only uses one or two threads.
In the general Intel case, it’s a question if real copies occur or if the buffers are simply mmapped since it’s all system memory. Would save a lot of time.
All configs have SSD disks installed. Only Vostro has some slowdowns after few seconds while benchmarking. And I was curious and tried Inspiron on original HDD 5400 - there were no significant difference in results. All machines have at least 8GB RAM.
Export codecs were always same - MP4 720p (Lightworks “YouTube 720p” preset).
About “copies vs mmaped buffers” - I have no idea. I’ll ask devs. Thank you much for a hint.
When exporting a project, it boils down to
decode (cpu)->copy(GL?)->generate/apply effects(GL)->copy(GL?)->encode(cpu)
AFAIK it is something like that, in multiple threads.
Question is, where is the missing time spent? Maybe LW is using GL commands that are better optimized in > Mesa, maybe the de-/encoding is gpu assisted on Mesa. Only LW devs can tell that.
I’m trying to find the place of possible bottleneck for weeks, and I’m still learning differences betwen drivers. As you said - we can’t find the one function causing the slowdown, but we can try to reduce list of suspicious places. I’m writing here not for complete solution, but to catch information / confirmation that Nvidia chip nor driver can’t responsible for such bottlenecks / get some hints.
A small update: I’ve repeated tests with kernel 4.12.14-1. Export test on Intel and Quadro M620: both 1m28s. The playback during editing is smoother with Quadro, though.