My NVIDIA card is only used for OpenCL work and my X session is using my onboard intel GPU (on a desktop PC).
darktable seems to detect my card fine and uses OpenCL correctly - I can see the OpenCL jobs running in my darktable log and the speedup in processing is noticeable. It continues to work fine as long as it keeps sending OpenCL jobs to the card. However it seems that if I stop using darktable for a short period of time (about 20 seconds), when I continue using the application, all subsequent OpenCL processing fails and darktable falls back to using the CPU.
Whenever I get the failure in darktable an Nvidia error message is also seen - if I run ‘journalctl -f’ while processing in darktable - as follows
Looks like it might be an issue with the drivers. I downgraded to 390.87 and the issue remained. However on 340.107 the issue went away (these are the only packaged versions of the drivers with Arch Linux so the only ones I can easily test with).
Can you please attach nvidia bug report as soon as issue hit to your existing post? Also please share detailed issue reproduction steps so we can replicate this issue internally at Nvidia. What desktop env you are running Gnome, KDE or else? Also, do you have any latest nvidia GPU board to test?
Thanks for info. As per log looks like you have onboard intel gpu and running X using intel. Is the Xid 31 issue repro if you use only nvidia driver and display monitor connected to nvidia gpu?
I initially tried with Nvidia GPU running my X session but I got a number of issues (screen artifacts, tearing etc.) so moved back to just using it for OpenCL (it was the only reason I purchased the card and ensures that all of the 2GB RAM is available for darktable). It was a while ago but I believe I still got the same openCL errors reported by DarkTable, though I hadn’t at that point seen the related Xid error (I didn’t run journalctl). X was just generally unuseable and that was the more important issue at the time.
I moved back to using the onboard intel graphics for X in an attempt to resolve the issues and it worked for all but the Xid error.
Now that I have been using the 340.107 driver for some time (still running X on the intel and only using the Nvidia card for OpenCL/darktable) I can confirm I haven’t had any further errors.
See if you can reproduce this issue with X is running on nvidia gpu and driver. I have submitted bug 200458568 to track this issue.
I just tested with buntu 18.04.1 LTS + GeForce GTX 770 with random image but not able to reproduce this issue. I’ll match exact os and then try. Can you share your image you are using for editing and some editing options.
I’ve just tried this with X running on the nvidia gpu and the latest (beta) driver. The intel GPU was switched off in BIOS and all xorg entries relating to it were switched for Nvidia. When I try the same scenario again (start editing, leave 20 seconds then continue editing) X freezes and I’m forced to reboot. I can only assume it’s the same error but I’m not able to produce a bug report. I’ve tried this a few times and can consistently reproduce the error (X freezing after 20s)
The error is not specific to the image I’m editing. Just import any image, double-click on it to enable darkroom view and then turn on a few modules (on the right hand side of the screen). Zooming in and out with the scroll wheel causes darktable to regenerate the image and your terminal session will confirm that it’s used the device (device 0). It might be worth double-checking that openCL is definitely enabled within darktable - if you run “darktable -d opencl” it should, among other things, output to the terminal: “FINALLY: opencl is AVAILABLE on this system”
On Arch Linux I’m using the nvidia-lts, nvidia-utils and opencl-nvidia packages which are the versions that work with the linux-lts (v4.14.76) package, which should be used in preference to the base linux package that comes preinstalled with Arch.
The package versions without the error are nvidia-340xx-lts, nvidia-340xx-utils, opencl-nvidia-340xx.
I have tested with latest Arclinux + 4.18.16-arch1-1-ARCH kerenl + X.Org X Server 1.20.3 + XFCE4.12 + 410.66 driver + GK104 [GeForce GTX 770] + Dell Inspiron 5680 system [ i7 cpu] + darktable + 4k as well as 19x20 monitor but not able to repro this issue. Looks this issue is specific to system[MSI Z87M-G43 (MS-7823) system. Intel® Z87 Express chipset based system + i5 system ] or GPU you are using. Do you have any other GPU to test? Who is the vendor of your GPU?
I’m not running the latest kernel but the LTS kernel (4.14.78) - “linux-lts” package on Arch Linux. My GPU is the “MSI N770 TF 2GD5”. I don’t have another GPU I can use. The monitor should not make a difference since I’m using intel to handle the graphics in X.
Thanks for testing. Can you please use nvidia to handle the graphics in X and test? Also, can I get video recording to see how you are reproducing this issue? Not sure how this issue is very specific to your setup.
Ok so here’s a video I created reproducing the issue on the latest linux and nvidia driver. In this video I’m still using my intel onboard graphics to run the video on X (since that’s the only way I can produce reasonable output - I’ll retest with everything on nvidia when I’ve got time). The image I’m using is a JPEG and I’ve done nothing to it at all (not enabled any modules in darktable).
The video shows the following steps…
Launch Darktable with openCL debug output by running “darktable -d opencl” in a terminal
Briefly alt-tab back to the terminal emulator to highlight the evidence that opencl is being used for processing by darktable (“opencl is AVAILABLE”)
Alt-tab back into darktable, which has opened in the lighttable view. This view lists images I’ve previously imported - if you’ve not yet imported an image you can do this now via the import module on the left. Any JPEG will do or you can use a supported raw file - either will exhibit the error.
Double-click on an image to load it into the darkroom view. Darktable uses opencl to generate the image in the centre of the screen and the preview in the upper left. I briefly alt-tab back to the terminal and you can see it generated the [thumbnail], [full] image and [preview], all using device 0 (which is the nvidia card)
Now go back into darktable and wait 20 seconds or so (I’m using the clock on my panel to time it), without doing anything in darktable.
When 20 seconds is up, go back into darktable and zoom in and out on the image a couple of times (using my mouse scroll wheel). This causes darktable to attempt to reprocess the image using openCL at each zoom level and after 10 failed attempts it falls back to the CPU and stops using the nvidia card.
Alt-tab back to the terminal to see the errors. It’s clear here that only the first three operations (listed in point 4) were processed by openCL. All subsequent operations failed and the last 4 items were processed on the CPU (device -1)
Closed darktable (ctrl-q) and ran journalctl to view the Xid error.
Latest version of linux (4.19.4) and latest beta driver (415.18) gives a slightly more verbose error in journalctl, if this helps…
Nov 26 18:06:36 alpha kernel: NVRM: GPU at PCI:0000:01:00: GPU-b555bcb7-296e-a42a-e679-f16ee6b13677
Nov 26 18:06:36 alpha kernel: NVRM: Xid (PCI:0000:01:00): 31, Ch 00000009, intr 10000000. MMU Fault: ENGINE CE1 HUBCLIENT_CE1 faulted @ 0x2_0002f000. Fault is of type FAULT_UNSUPPORTED_APERTURE ACCESS_TYPE_WRITE
Not sure if that means it’s a different error but it was reproduced using the same method as shown in my video above - all other outputs are identical to the video except for the Xid error shown.