I hope this is the right form to post this bug in.
I work with optix ray tracing. In our application we are having this issue where in ptxjitcompiler and takes a significant chunk of time which slows down the startup of our application. I see some varied observations with regards to platform and the graphics card in use that I am unable to explain or fix.
We have two system one with RTX 2080TI and one with Titan Xp. On the RTX 2080TI we don’t see any calls to the libnvidiaptxjitcompiler.so. But I do see it on the TitanXp machine. Even when I change the NVCC flags to include the architecture for the Titan Xp machine I do not see any improvement. How is it that this library is not called on the RTX 2080TI. What needs to be done on the TitanXp machine to make sure that I get the same startup performance as RTX 2080TI ? They both are on the same nvidia driver version.
It is very hard to even find the different libraries that are being called on windows to be honest. I tried Nsight System it does not detail out this information which is why I had to resort to Linux. If you can suggest a tool that can layout the different calls to the libraries would be very helpful (I used an application called FlameGraph in Linux). Back to PTX performance on windows even on RTX 2080TI I get very poor performance. Although it does get better from the second run I guess there is some kind of caching happening, but I don’t see the same performance as I see in Linux RTX 2080TI.
Some numbers that we are getting (we start and stop the application to render some amount of frames in this case 25 frames ),
Linux 2080 TI takes 0.05s
Linux TitianXp takes 5-6s
Windows 2080TI takes 5-6s for the first time and comes down to 3s from the second time we launch the same application.
It remains the same even if I try to compile the PTX files to same architecture on which the application is running.
I would highly appreciate if someone helps me understand what exactly is going on with the different platforms and different GPU architectures.
How can I get the same performance as I get in the Linux 2080TI system where in the library is never called.