How can I force my OptiX program to run on the GPU to improve performance?

Hello engineer.
I am now encountering a problem while doing Optix program simulation data.

When the program finishes running through the OpenCV window to display the results, the performance of using the RTX2060 graphics card on the Windows 10 platform can only reach 1 frame. The same code on the Linux platform, using the GTX970 graphics card can run about 50 frames. Why is this problem? Does the OptiX code need to force the program to run on the graphics card by using functions like setDevice like traditional CUDA programs? I suspect that my RT program is running through the CPU.

Looking forward to hearing from you soon.

Any one is here?

Please provide more detailed information when asking questions on a developer forum.

What are your system configurations on the two systems?
OS version, installed GPU(s), VRAM amount, display driver version, OptiX major.minor.micro version, CUDA toolkit version used to generate the input PTX, host compiler version.

  • What did you do to analyze the problem?
  • Are the OptiX SDK examples running fast on either system?
  • How long takes the ray tracing simulation on either system (in milliseconds)?
  • How long takes the display mechanism on either system (in milliseconds)?
  • OptiX has no software fallback, only OptiX Prime. Are you using the OptiX Prime API? That would not use the RTX board to its full potential.
  • Do you have multiple GPUs installed in the system?
  • If the display is slow, what is the exact mechanism to display the result?
  • How big is the result?


My friend’s Linux computer:
CPU: i5??
OS: Linux
CUDA: 9.2
Effect of running the same program: 50 frames

My computer:
CPU: i7-9750H 2.59GHz
GPU: RTX2060 6GB Driver:441.66
OS: Windows10
CUDA: 10.1
Effect of running the same program: 1 frames

For PTX: I did not use the sutil :: getPTX method in the SDK. I compile the PTX code by nvrtcCompileProgram () function, fill in “compute_75” and other parameters. I have tested it in the SDK program without any problems.

Next, I used the performance profiler provided by VS2017 to analyze the optixWhitted program provided by the SDK and my flawed program.

For the optixWhitted program: GPU usage slowly climbs to about 90% and stabilizes, while CPU usage is relatively low, which is ideal.
For my own program: the CPU usage is relatively low; but the GPU usage is almost low, less than about 10% and it does not continue to fluctuate.

I don’t know what keeps my GPU almost idle, and I’m sure I’m not using Optix Prime.

My program uses an OpenCV window to display each frame of the image. Through performance acceleration, the number of frames is higher.

If there is no detailed description of the problem please be sure to ask me to fully describe the problem.

I don’t know what to do with this weird question, and I am eager to get your guidance. Thank you for your busy schedule!

That still doesn’t give enough insights into why something you programmed is not performing much better on a higher-end GPU.

If you profiled the issue, check which calls are taking most time.
If this runs at only 1 fps, it should be enough to stop inside the debugger a few times and see if that always hits the same function. Then single step though that and see what exactly is happening.

  • If that is the OptiX launch call, then you would need to analyze what exactly happens around it.
  • Are you changing acceleration structures between launches?
  • Is there a kernel recompilation?
  • Is the scene huge and are you are setting rtVariable values between launches? (Put them into a buffer and update the values in that instead.)
  • Enable the OptiX usage report callback at the highest level to get more information from OptiX:
    Example code:
  • Enable all OptiX exceptions and implement an exception program. Check if there are any device exceptions called.
  • Make sure there are no invalid rays. Check if all ray directions are valid and normalized.

1.The light should be fine。
2.The possible states of each light are set。
3.We will not recompile the kernel function
4.This is called internally by optix

I tested a lot with the profiler and found some time consuming functions. But first of all, there is a problem, that is, the GPU occupies the entire program consistently within 10%, but the OptiX module is no problem. Although the OpenCV math function takes up some time, I want to ignore it first.
I want to solve why the Optix fiber tracking and collision are calculated normally, but the GPU call is very stable.