maxNumRangeTreeNodes confusion

/**

  • \brief Input parameter to define the counterDataImage
    /
    typedef struct CUpti_Profiler_CounterDataImageOptions
    {
    size_t structSize; //!< [in] CUpti_Profiler_CounterDataImageOptions_Params_STRUCT_SIZE
    void
    pPriv; //!< [in] assign to NULL

    const uint8_t* pCounterDataPrefix; /**< [in] Address of CounterDataPrefix generated from NVPW_CounterDataBuilder_GetCounterDataPrefix().
    Must be align(8).*/
    size_t counterDataPrefixSize; //!< [in] Size of CounterDataPrefix generated from NVPW_CounterDataBuilder_GetCounterDataPrefix().
    uint32_t maxNumRanges; //!< [in] Maximum number of ranges that can be profiled
    uint32_t maxNumRangeTreeNodes; //!< [in] Maximum number of RangeTree nodes; must be >= maxNumRanges
    uint32_t maxRangeNameLength; //!< [in] Maximum string length of each RangeName, including the trailing NULL character
    } CUpti_Profiler_CounterDataImageOptions;
    maxNumRanges:I guess it means cupti can collect max ranges in one session。
    maxNumRangeTreeNodes:What does this variable means,and assign what value to this variable。

We can think each range as a node in a tree. CUPTI supports nested range profiling where we can add a (Push/Pop) range inside another range.

To understand both the parameter lets consider below case:

cuptiProfilerPushRange(“RangeA0”)           // Push rangeA0 at nesting level 1
    Launch kernel A
    cuptiProfilerPushRange(“RangeB0”)       // Push rangeB0 at nesting level 2
        Launch kernel B0
        cuptiProfilerPushRange(“RangeC0”)   // Push rangeC0 at nesting level 3
            Launch kernel C0
        cuptiProfilerPopRange()             // Pop rangeC0
    cuptiProfilerPopRange()                 // Pop rangeB0
    cuptiProfilerPushRange(“RangeB1”)       // Push rangeB1 at nesting level 2
        Launch kernel B1
    cuptiProfilerPopRange()                 // Pop rangeB1
cuptiProfilerPopRange()                     // Pop rangeA0

We can visualize the range structure as tree where every range is a node,

A0
|----------B0
|          |----------C0
|
|----------B1
^          ^           ^
(1)       (2)         (3)          <----- Nesting Level

While profiling we have two specific parameters in the cuptiProfilerSetConfig API, minNestingLevel and numNestingLevel.

Case1, When minNestingLevel = 1 and numNestingLevel = 3 ← CUPTI will profile all the ranges.

Case2, When minNestingLevel = 1 and numNestingLevel = 2 ← CUPTI profile all the ranges in level 1 and 2 so C0 will be ignored.

Case 3, When minNestingLevel = 2 and numNestingLevel = 2 ← CUPTI will ignore all the ranges whose level is less than 2. So only B0, B1 and C0 will be profiled and A0 will be skipped.

Now CUpti_Profiler_CounterDataImageOptions is used for creating the counter data image which stores the range data. For case 3, we can set maxNumRanges =3, as we are profiling 3 ranges but maxNumRangeTreeNodes will be 4. CUPTI checks if root node are available or not before profiling the child nodes. Based on the minNestingLevel value, CUPTI will skip the profiling but for creating the tree structure CUPTI need to know what is the maximum number of range tree nodes are there.

If you consider Case 2, both maxNumRanges and maxNumRangeTreeNodes will be 3.

Note that all the above concept are essential when we are doing nested range profiling. For Auto range profiling there is no nesting happens so both the parameters will be same which is equal to number of ranges / kernel launches (In Auto range each kernel is treated as a range) profiled.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.