Geforce Driver 511.17 made my program not work

shocker.0x15 · January 12, 2022, 4:54pm

Hello,

Today I updated the driver to 511.17.
The renderer I’m developing using OptiX 7.4.0 doesn’t work with this version of the driver.
Initially error messages say:

Error: Symbol '_ZN3vlr19mat_Rec709_E_to_XYZE' was defined multiple times. First seen in: '__direct_callable__NullBSDF_setupBSDF_and_186_more_ID10'
Error: Symbol '_ZN3vlr19mat_XYZ_to_Rec709_EE' was defined multiple times. First seen in: '__direct_callable__NullBSDF_setupBSDF_and_186_more_ID10'
...

These variables are __constant__ and their values are directly written in the source code like:

CUDA_CONSTANT_MEM HOST_STATIC_CONSTEXPR float mat_Rec709_E_to_XYZ[] = {
    0.4969f, 0.2562f, 0.0233f,
    0.3391f, 0.6782f, 0.1130f,
    0.1640f, 0.0656f, 0.8637f,
};

The program failed at optixPipelineCreate for the second pipeline in the same OptiX context.
I tried to add static qualifiers to these variables and the above messages disappeared but now it says simply COMPILE ERROR: failed to create pipeline.

I don’t know if I have had something that was potentially wrong in older drivers, or this is a bug of the new driver.
__constant__ variable usage in multiple pipeline in the same CUDA context seems to be somewhat related, but I’m not sure.

This is the repository for my renderer as a reference:

Thanks,

Environment:
Windows 10 21H2
Geforce RTX 3080
CUDA 11.4

dhart · January 12, 2022, 8:55pm

Hi @shocker.0x15,

I’m not yet aware of reasons this might be happening in the new driver. Normally for constants, you do need to declare them static so they’re duplicated, or use extern. Have you tried declaring them extern?

One possibility with the vague error when using static is that constant memory could be running out. (There is a size limit of 64KB.) Do you know how much constant memory you’d be consuming if it gets duplicated? I believe this would include your launch params as well as all copies of your color conversion matrix here. The _186_more_ in the error message got me wondering if static combined with heavy includes has ended up duplicating your matrix like 200x or more. Using extern would be ideal, if you can, just like with a host code linker, so that your constants are only defined once.

Have you verified whether the symbol really is multiply defined? If it is but it worked on accident with previous drivers, then it might be something latent that was always in need of a fix. But, another possibility is that this error message is misleading and being caused by something else.

There were some module linking improvements introduced in the 510 branch drivers, so it is possible there’s a new bug here on our end. Can I build directly from your github repo to reproduce this issue? Are there any special instructions or steps? Is it straightforward from your README? Do we need to pinpoint a specific commit or branch to test against?

–
David.

shocker.0x15 · January 13, 2022, 4:17pm

Thanks for the reply.
I tried extern instead of static, the error messages become the original Symbol '_ZN3vlr19mat_Rec709_E_to_XYZE' was defined multiple times..
However, can I declare them with extern while assigning hard-coded values in CUDA? Should I declare values in another file and need to load the module containing variables’ bodies?

    extern __constant__ float mat_Rec709_D65_to_XYZ[] = {
        0.4124564f, 0.2126729f, 0.0193339f,
        0.3575761f, 0.7151522f, 0.1191920f,
        0.1804375f, 0.0721750f, 0.9503041f,
    };

What is the correct way to define __constant__ variables only once when using multiple kernels/pipelines which refer to the same __constant__s defined in a header?

BTW, color matrices are obviously compile-time constants, so I ideally want them to be constexpr in the first place. Is there a way to declare global constexpr variables for CUDA kernels?

Regarding my renderer in the repository, it doesn’t have very easy setup path. You need to build/setup assimp and OpenEXR for the host program, while renderer library itself can be built only with OptiX/CUDA. Well, the issue happens in the initialization pass of the renderer so assimp/OpenEXR are not needed actually.

Best,

shocker.0x15 · January 13, 2022, 4:45pm

I created a dedicated branch for this issue in case you can try my renderer’s initialization directly.

Open VLR.sln (in my case VS 2019)
Build/Run “Init” project.
You’ll see the renderer fails to create a pipeline at line 698 in libVLR/context.cpp.
p.pipeline.link(2, VLR_DEBUG_SELECT(OPTIX_COMPILE_DEBUG_LEVEL_FULL, OPTIX_COMPILE_DEBUG_LEVEL_NONE));

dhart · January 13, 2022, 9:53pm

Thanks for the setup notes, I’ll maybe try to repro in the next day or so.

So good question about constexpr, I don’t know exactly how well our tools support constexpr. But, you should be aware that when marking globals as constexpr or as const, neither will guarantee that the variable is compiled out. Usually a single value constant will be compiled out, but in the case of an array, it depends on whether you always access the array with compile time constant indices. If there are variable indices used to access the array, that will force the compiler to produce a symbol and to put the values in memory. An extern that can’t be resolved at compile time would also force the array into memory, so even if using extern solved the multiple definitions error, it might not be what you want.

There might still be a way to convince the compiler to not produce a symbol for your matrix, and to get it to inline all references to this data. That would include making sure you don’t allow index-based random access to the matrix, and that all the color conversion routines use matrices that are known-at-compile-time. If you use a generic matrix-vector multiply, and just pass it a matrix pointer, and the pointer can change dynamically, then that might force the matrix data into memory because it can’t be resolved at compile time. So, you don’t really need to use constexpr syntax, it’s more about making sure all the array access is amenable to inlining.

Aside from that, linking extern variables in CUDA is just like in C or C++ - you would define the symbol and data in one place, and then wherever you need access from a different compilation unit, you would declare an extern symbol. The content of the matrix (float values) shouldn’t appear on the extern declaration.

–
David.

shocker.0x15 · January 15, 2022, 11:12am

oh, it seems I stupidly failed to push to the branch, sorry!

shocker.0x15 · January 25, 2022, 3:57pm

How is the status?
What is the correct way to define a single __constant__ variable among multiple OptiX pipelines and pure CUDA kernels?

I tried another way for example where I declare extern __constant__ variables without the actual contents of arrays in the header and defining the contents in a CUDA kernel, then loading the module containing the kernel, but this failed as well.

Best,

dhart · January 26, 2022, 1:03am

Launch params are in constant memory and can be shared / used for multiple OptiX & CUDA kernels, would that work?

Is there some different data besides the color conversion matrices that you do definitely want residing in device memory? Or is this because the compiler is not letting you compile out the constant color conversion matrices?

–
David.

shocker.0x15 · January 26, 2022, 4:35pm

I’ll try launch param approach to see if it works.

Basically I want to keep the code as semantically clean. I mean for example for color matrices, it should be defined once somewhere in common header with constexpr. This is doable in host side code and I believe this is what it should be. Compile time array constants seem not to work well in CUDA code at least for now, so I think the variables should be demoted to __constant__ with contents hard coded in the source code in this case. (Using __constant__ seems natural alternative than __device__). This alternative approach had been worked until I updated the driver, so I feel this is an issue.

Defining those variables in launch params may work but seems unnatural (why color matrices are in “launch” params?) and kind of workaround (so not what it should be in the end) particularly when seeing from a pure CUDA kernel not directly related to OptiX.

I’m happy if there is more reasonable approach for this issue or if this issue is just a bug in the driver.

Best,

dhart · January 26, 2022, 10:37pm

Okay so I checked out your only_init branch, and was able to reproduce the issue you’re seeing. I don’t think you’re doing anything wrong. I think this is a bug on our end. The good news is that more recent driver builds have already fixed this problem, I verified it with two different drivers from this week. The latest release 511.23 does not fix it yet, but a newer driver is going to fix it at some point soon. The slightly bad news is that I don’t know when the fixed driver will be public, and I searched and asked around and couldn’t find anyone who knew anything about this bug. I’m guessing it’s likely a byproduct of something else that got broken & fixed.

So as to the larger question, declaring and referencing constant memory that is not part of the OptiX launch params is allowed and should work just fine. The main thing you’d need to do is mark it extern if it’s being included in multiple places, and to handle the copies to device constant memory yourself before the launch.

The thing to watch out for is running out of constant memory accidentally. It’s limited to 64Kb, and the OptiX Launch params are in there too, and so it might be easy to run out if you used a large block and accidentally duplicated the memory by putting a non-extern declaration in a header file and compiling it multiple times.

I should clarify my suggestion about putting these things in launch params, it might be a bad idea. OptiX is managing the constant memory part. The buffer you pass to OptiX doesn’t live there, it’s copied to constant memory on launch. This means that it’s easily accessible from an OptiX program, but not easily accessible from a CUDA kernel. So if you need both, it would be better to allocate your own block(s) of constant mem.

–
David.

shocker.0x15 · January 27, 2022, 3:57pm

Wow, good to hear that this is likely a driver bug!
Thanks for investigating. I’m probably okay to wait the fixed driver to be released while I’m not sure how much duration it will take to be public.

My original code didn’t use extern and if the same constants are allocated multiple times in this case, it seems obviously better to fix.
I ideally would like to embed the contents of the constants into the source code.
When I declare the constants with extern without the contents in the header:

    // These matrices are column-major.
#if defined(VLR_Host)
    static constexpr float mat_Rec709_D65_to_XYZ[] = {
        0.4124564f, 0.2126729f, 0.0193339f,
        0.3575761f, 0.7151522f, 0.1191920f,
        0.1804375f, 0.0721750f, 0.9503041f,
    };
    static constexpr float mat_XYZ_to_Rec709_D65[] = {...};
    static constexpr float mat_Rec709_E_to_XYZ[] = {...};
    static constexpr float mat_XYZ_to_Rec709_E[] = {...};
#else
    extern __constant__ float mat_Rec709_D65_to_XYZ[9];
    extern __constant__ float mat_XYZ_to_Rec709_D65[9];
    extern __constant__ float mat_Rec709_E_to_XYZ[9];
    extern __constant__ float mat_XYZ_to_Rec709_E[9];
#endif

and defined the actual contents in a dedicated kernel file, loading pure CUDA modules (including the dedicated module) succeeded but the first OptiX pipeline failed at linking. Is this what you expect to do, but currently failed due to the driver bug? Or embedding the contents of constants is impossible and do I need to transfer data from the host?
My constants are in namespace vlr, is it a way to get the symbol in a namespace with cuModuleGetGlobal ?

Thanks,

shocker.0x15 · February 2, 2022, 6:04pm

Just a note for someone facing the same problem:
Today’s latest driver 511.65 seems to still have the issue.

shocker.0x15 · February 2, 2022, 6:13pm

@dhart Would it be possible to give (rough) estimate for the release date of the driver which solved this issue? I’d like to know if it is several weeks or months.

Thanks,

dhart · February 2, 2022, 6:16pm

FWIW, I confirmed the same with 511.65 and your reproducer. The internal driver version I tested working is numbered 515 . This doesn’t rule out a fix in a 511 driver, but since I couldn’t find the actual bug report, it might indeed not get merged into the 510 branch. If the fix is only in 515, it will be a couple of months. I’ll double check if someone on the team can try to track down the fix and merge it earlier.

–
David.

shocker.0x15 · June 16, 2022, 5:05pm

I confirmed that the latest 516.40 has resolved this issue.
I’ll mark this thread as resolved.

dhart · June 16, 2022, 8:42pm

Thanks for following up and confirming, and sorry the release process took so long, but I’m glad it’s truly resolved.

–
David.

system · June 30, 2022, 8:42pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Update to OptiX 6.0 from 5.0.1 Crashes with CanonicalState still used in function OptiX	18	1321	June 14, 2022
OptiX compilation error in validation mode OptiX kernel , optix	8	820	January 31, 2024
Help!! I can't get my NVidia GeForce GT 525M to load in a single CUDA PTX kernel!! CUDA Programming and Performance	11	5784	November 16, 2012
optiXTutorial 11 - remove (free)GLUT OptiX	37	4640	June 14, 2022
Still cannot properly watch variables in CUDA kernels Nsight Visual Studio Edition	27	3883	February 23, 2021
Working Optix 3 App fails Optix 4 OptiX	9	765	June 14, 2022
OptiX Error: 'Failed to load OptiX library OptiX	51	18061	June 14, 2022
Implementing image filter kernels at runtime OptiX	18	1147	June 14, 2022
SetAcceleration fails after upgrading from 3.9.1 to 4.0.0 OptiX	16	992	June 14, 2022
Unknown optix error related to Variables OptiX	5	1880	June 14, 2022

Geforce Driver 511.17 made my program not work

Related topics