[REGRESSION] 535.43.02 Breaks CUDA (Blender, etc)

I upgraded to 535.43.02, and blender-benchmark fails to run any benchmarks on the GPU, with a failed to get benchmark json data errors. All CPU benchmarks work fine.

So I tried to render a .blend file, and when I clicked “Render” it completely failed while trying to run with CUDA or Optix, the window popped up, it didn’t show anything other than the gray checkered (or whatever it is) window, and then Blender altogether crashed. I was running from a terminal and got this error:

blender ~/Downloads/monster_under_the_bed_sss_demo_by_metin_seven.blend
Read prefs: /home/matt/.config/blender/3.5/config/userpref.blend
Read blend: /home/matt/Downloads/monster_under_the_bed_sss_demo_by_metin_seven.blend

WARNING: The enviroment variable SYCL_DEVICE_FILTER is deprecated. Please use ONEAPI_DEVICE_SELECTOR instead.
For more details, please refer to:
https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#oneapi_device_selector

free(): invalid pointer
[1]    28996 IOT instruction (core dumped)  blender ~/Downloads/monster_under_the_bed_sss_demo_by_metin_seven.blend

When running blender-benchmark-cli --verbosity 3 I get the following error:

Device has compute preemption or is not used for display

This is an RTX 3090.

I switched back to the Vulkan beta drivers, 525.47.26, and all three things complete perfectly fine (literally ZERO other changes). No issue on 530.41.03 either.

I did make sure that nvoptix.bin was correctly installed in /usr/share/nvidia as well, and it was.

I will add that that SYSCL_DEVICE_FILTER line does NOT refer to any envars I’ve set. I’ve checked with env.

Thanks for reporting this. For future reference, we’re tracking it in internal bug number 4138596.

I tracked this down to an incompatibility between libjemalloc and libraries that are loaded with the RTLD_DEEPBIND flag. This scenario runs into that problem because Blender is linked with libjemalloc and libnvidia-ml.so.1 is loaded with RTLD_DEEPBIND, triggering the bug. For now, you can try to avoid the problem by forcing libnvidia-ml.so.1 to be loaded earlier:

LD_PRELOAD=/usr/lib/libnvidia-ml.so.1 blender

Can you please give 535.54.03 a try? This issue is missing a changelog entry but it should be fixed in this release.

Absolutely. Yeah I read the changelog and didn’t see it so I figured it got left out. I already built the driver packages on my Arch install, I just haven’t installed them and rebooted yet. I’ll post as soon as I do.

Okay so it seems that the issue is fixed. I am having quite a few other issues (not with this driver, just in general), but I’ll report those separately.