Total number of threads corresponding to launch configuration

starkr · January 31, 2020, 10:13am

Hi, I’ve been profiling our ray-tracing kernel with Nsight Compute and I have some difficulties parsing the readings related to ‘Launch Statistics’ metrics.

That is, I tried shuffling the order of my launch dimensions to test how it affects the performance of the kernel, and Nsight Compute reported a noticeably different total number of threads, launched to process the kernel.

Let’s assume I have a [dim_x, dim_y, dim_z] launch configuration. Nsight reported a total number of (dim_xdim_ydim_z) threads, with a block size of 64 threads. Shuffling the order to, e.g., [dim_z, dim_y, dim_x] results in a larger number of threads, still with a block size of 64.

The former launch configuration is quite a bit faster, which I initially attributed to reduced thread divergence (number of various instructions issued during the execution of the kernel was reduced substantially, according to Nsight Compute). But now I am wondering if it has anything to do with these extra threads that pop up for the latter configuration.

Where do these extra threads come from?
Are these dummy threads, or they do anything, and thus could be responsible for the elevated number of issued instructions for the second configuration?

EDIT: I am using Optix 6.0, driver 436.30

dhart · January 31, 2020, 10:08pm

How are you launching exactly in both cases? Using rtContextLaunch3D()?

The launch size should be exactly the product of the launch dimensions, no more, no less.

–
David.

Topic		Replies	Views
Launch dimensions in LaunchContextnD and optixLaunch OptiX	5	1609	October 12, 2021
Query regarding launch_block_size and launch_thread_count reported by Nsight Compute for CUDA kernel Nsight Compute	3	817	March 31, 2023
How many rays can be processed in parallel OptiX	1	607	August 14, 2023
The size of an OptiX launch and computing resources OptiX	1	574	May 2, 2023
3D OptixLaunch to accommodate multiple viewpoints OptiX	4	1112	October 12, 2021
rtContextLaunch1D with multiple GPUs OptiX	3	740	June 14, 2022
Maximum Launch Dimension in OptiX 6 OptiX	3	889	June 15, 2022
Does NSight captures traversal statistics? OptiX	13	967	June 14, 2022
Why register per thread in nsight compute different from nvcc --ptxas-options=-v？ Nsight Compute	6	743	January 19, 2023
Metric for number of threads launched Nsight Compute	2	551	November 14, 2021

Total number of threads corresponding to launch configuration

Related topics