Generate line tables and optimize with libnvvm

Is it possible to get libnvvm to optimize code AND generate line tables from provided debug info in LLVM IR?

I have a codegen backend for the Rust language that targets libnvvm, it is able to generate debug line info for usage inside of something like Nsight Compute, that works great, but it has a big problem. It seems like it is not able to optimize and generate debug info at the same time. When i provide debug locs inside of the NVVM IR, libnvvm generates a module that has debug info that works great, but is slow (such as 450ms vs 30ms) compared to no debug info.

I am not using -generate-line-info, and giving it -opt=3 explicitly doesn’t do anything.

Is this an expected/known limitation or should i be doing something else?

With the NVCC cuda compiler, generating debug symbols for device code (-G) disables all optimizations. -generate-line-info only produces line numbers, but does not disable optimizations.
https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-altering-compiler-linker-behavior

I do not know about libnvvm specifically.

Yeah i’d like to know what nvcc gives to libnvvm to do that, i seem to be unable to reproduce that behavior.