Take full advantage of CUDA core and RT core

optixLaunch is called in host code to invoke a 1D, 2D or 3D array of threads on the device and invokes ray generation programs for each thread. Are these threads supported by CUDA core? If so, how many threads can be started in parallel to make full use of the cuda core?
When the ray generation program invokes optixTrace, other programs are invoked to execute traversal. How many rays can traverse in parallel to make full use of the cuda core (for custom primitive) and RT core?

optixLaunch is called in host code to invoke a 1D, 2D or 3D array of threads on the device and invokes ray generation programs for each thread. Are these threads supported by CUDA core?

Yes, OptiX is using CUDA.

https://raytracing-docs.nvidia.com/optix7/guide/index.html#introduction#overview

If you read through the OptiX 7 Programming Guide and example source code, you’ll see that all resource management inside your host application is done with native CUDA host calls, either using the CUDA Runtime API or the CUDA Driver API.
OptiX is using a single ray programming model and does all scheduling internally, so some of the native CUDA features like shared memory access and warp synchronization instructions are not allowed.

Then all device code you implement for the different program domains (raygen, exception, closest hit, any hit, intersection, miss, direct or continuation callables) is written in CUDA C++ device code.

All these device programs run on the Streaming Multiprocessors (SM) of the GPU.
On RTX GPUs there are also RT cores which handle traversal through the acceleration structures and the ray-triangle intersection calculation in hardware.
Then there are Tensor cores which are used for the OptiX Denoiser.

If so, how many threads can be started in parallel to make full use of the cuda core?

The OptiX launch dimension has a limit of 2^30.

https://raytracing-docs.nvidia.com/optix7/guide/index.html#limits#limits

When the ray generation program invokes optixTrace , other programs are invoked to execute traversal.
How many rays can traverse in parallel to make full use of the cuda core (for custom primitive) and RT core?

That is completely abstracted by OptiX. It will launch as many threads from the given launch dimension in parallel as there are resources available on the underlying GPU.

Note that it depends on the underlying GPU how many rays are required to saturate a modern GPU architecture.
Meaning there is a minimum number threads required to make best use of the GPU hardware which is directly related to how many CUDA cores a specific GPU has, and how many registers are used inside a device program. Then there are other factors like the memory bandwidth, cache sizes, etc. It’s complicated.

You’ll find more information about this inside the CUDA Programming Guide:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

Please use the search field in the top right of this forum to find more detailed information when you’re still having questions after reading the OptiX Programming Guide.
For example this thread:
https://forums.developer.nvidia.com/t/a-few-questions-about-bvh-traversal-engine-and-triangle-intersection-engine/233412