Geforce Driver 511.17 made my program not work

Hello,

Today I updated the driver to 511.17.
The renderer I’m developing using OptiX 7.4.0 doesn’t work with this version of the driver.
Initially error messages say:

Error: Symbol '_ZN3vlr19mat_Rec709_E_to_XYZE' was defined multiple times. First seen in: '__direct_callable__NullBSDF_setupBSDF_and_186_more_ID10'
Error: Symbol '_ZN3vlr19mat_XYZ_to_Rec709_EE' was defined multiple times. First seen in: '__direct_callable__NullBSDF_setupBSDF_and_186_more_ID10'
...

These variables are __constant__ and their values are directly written in the source code like:

CUDA_CONSTANT_MEM HOST_STATIC_CONSTEXPR float mat_Rec709_E_to_XYZ[] = {
    0.4969f, 0.2562f, 0.0233f,
    0.3391f, 0.6782f, 0.1130f,
    0.1640f, 0.0656f, 0.8637f,
};

The program failed at optixPipelineCreate for the second pipeline in the same OptiX context.
I tried to add static qualifiers to these variables and the above messages disappeared but now it says simply COMPILE ERROR: failed to create pipeline.

I don’t know if I have had something that was potentially wrong in older drivers, or this is a bug of the new driver.
__constant__ variable usage in multiple pipeline in the same CUDA context seems to be somewhat related, but I’m not sure.

This is the repository for my renderer as a reference:

Thanks,

Environment:
Windows 10 21H2
Geforce RTX 3080
CUDA 11.4

Hi @shocker.0x15,

I’m not yet aware of reasons this might be happening in the new driver. Normally for constants, you do need to declare them static so they’re duplicated, or use extern. Have you tried declaring them extern?

One possibility with the vague error when using static is that constant memory could be running out. (There is a size limit of 64KB.) Do you know how much constant memory you’d be consuming if it gets duplicated? I believe this would include your launch params as well as all copies of your color conversion matrix here. The _186_more_ in the error message got me wondering if static combined with heavy includes has ended up duplicating your matrix like 200x or more. Using extern would be ideal, if you can, just like with a host code linker, so that your constants are only defined once.

Have you verified whether the symbol really is multiply defined? If it is but it worked on accident with previous drivers, then it might be something latent that was always in need of a fix. But, another possibility is that this error message is misleading and being caused by something else.

There were some module linking improvements introduced in the 510 branch drivers, so it is possible there’s a new bug here on our end. Can I build directly from your github repo to reproduce this issue? Are there any special instructions or steps? Is it straightforward from your README? Do we need to pinpoint a specific commit or branch to test against?


David.

Thanks for the reply.
I tried extern instead of static, the error messages become the original Symbol '_ZN3vlr19mat_Rec709_E_to_XYZE' was defined multiple times..
However, can I declare them with extern while assigning hard-coded values in CUDA? Should I declare values in another file and need to load the module containing variables’ bodies?

    extern __constant__ float mat_Rec709_D65_to_XYZ[] = {
        0.4124564f, 0.2126729f, 0.0193339f,
        0.3575761f, 0.7151522f, 0.1191920f,
        0.1804375f, 0.0721750f, 0.9503041f,
    };

What is the correct way to define __constant__ variables only once when using multiple kernels/pipelines which refer to the same __constant__s defined in a header?

BTW, color matrices are obviously compile-time constants, so I ideally want them to be constexpr in the first place. Is there a way to declare global constexpr variables for CUDA kernels?

Regarding my renderer in the repository, it doesn’t have very easy setup path. You need to build/setup assimp and OpenEXR for the host program, while renderer library itself can be built only with OptiX/CUDA. Well, the issue happens in the initialization pass of the renderer so assimp/OpenEXR are not needed actually.

Best,

I created a dedicated branch for this issue in case you can try my renderer’s initialization directly.

  1. Open VLR.sln (in my case VS 2019)
  2. Build/Run “Init” project.
  3. You’ll see the renderer fails to create a pipeline at line 698 in libVLR/context.cpp.
    p.pipeline.link(2, VLR_DEBUG_SELECT(OPTIX_COMPILE_DEBUG_LEVEL_FULL, OPTIX_COMPILE_DEBUG_LEVEL_NONE));

Thanks for the setup notes, I’ll maybe try to repro in the next day or so.

So good question about constexpr, I don’t know exactly how well our tools support constexpr. But, you should be aware that when marking globals as constexpr or as const, neither will guarantee that the variable is compiled out. Usually a single value constant will be compiled out, but in the case of an array, it depends on whether you always access the array with compile time constant indices. If there are variable indices used to access the array, that will force the compiler to produce a symbol and to put the values in memory. An extern that can’t be resolved at compile time would also force the array into memory, so even if using extern solved the multiple definitions error, it might not be what you want.

There might still be a way to convince the compiler to not produce a symbol for your matrix, and to get it to inline all references to this data. That would include making sure you don’t allow index-based random access to the matrix, and that all the color conversion routines use matrices that are known-at-compile-time. If you use a generic matrix-vector multiply, and just pass it a matrix pointer, and the pointer can change dynamically, then that might force the matrix data into memory because it can’t be resolved at compile time. So, you don’t really need to use constexpr syntax, it’s more about making sure all the array access is amenable to inlining.

Aside from that, linking extern variables in CUDA is just like in C or C++ - you would define the symbol and data in one place, and then wherever you need access from a different compilation unit, you would declare an extern symbol. The content of the matrix (float values) shouldn’t appear on the extern declaration.


David.

oh, it seems I stupidly failed to push to the branch, sorry!