About https://docs.nvidia.com/cupti/tutorial/tutorial.html#gpu-performance-profiling-using-range-profiler-api

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0

My CUDA version is 12.6, so I made modifications to the context creation function in the tutorial
CUctxCreateParams params = {};
params.execAffinityParams = nullptr;
params.numExecAffinityParams = 0;
params.cigParams = nullptr;

CUdevice device;
cuDeviceGet(&device, 0); //Obtain the 0th CUDA device (adjust the device number according to the actual situation)
cuCtxCreate_v4(&g_cuContext, &params, 0, device); //Change the last parameter to the obtained device

But the final output
Chip Name: AD102
Num of Passes: 1
Number of profiled ranges: 0

There is no expected indicator output in the tutorial. May I ask where the problem lies? Can this tutorial only be used with CUDA13.0 or is CUPTIAutoRange not supported in 12.6

Hi, @1978769439
Sorry for the late response. Can you clarify what exactly issue do you have ?
Also can you provide a mini-repro for us to reproduce ? Thanks !

It’s okay, I’ll explain in detail what I know now
I want to learn Cupti’s range profiling API, and my environment is CUDA12.6.3, CUPTI12.6.8, and GPU is 4090,
I suggest using the CUPTI Range Profiling API and the host API CUPTI Profiler Host API for versions after 12.6 as stated in the documentation. I have tried three different codes, autorange_profiling and range_profiling in/usr/local/CUDA-12.6/extras/CUPTI/samples, and GPU Performance Profiling using Range Profiler API in Tutorial in the official documentation
At present, autorange_profiling processing can run normally, and this code uses CUPTI-AutoRange. However, I have not been able to run the code for range_profiling and GPU Performance Profiting using Range Profiler API. Both of these are also using CUPTI-AutoRange mode, but the output obtained does not recognize range

range_profiling:
(sparse) wtyang@super:~/code/project/test/range_profiling$ ./range_profiling -d 0 -r auto -e kernel
Starting Range Profiling
Compute Capability of Device: 8.9
Num of Passes: 1
Range Mode: auto
Replay Mode: kernel
Total num of Ranges: 0
GPU Performance Profiling using Range Profiler API:
I added some intermediate information to check if it didn't run successfully, but I couldn't find it and still couldn't recognize the ranges. Here, I modified this sentence in the tutorial, but I can't recognize them here cuCtxCreate(&g_cuContext, (CUctxCreateParams*)0, 0, 0);I checked cuda.h and found that
#define cuCtxCreate                         cuCtxCreate_v2
#define cuCtxCreate_v3                      cuCtxCreate_v3
#define cuCtxCreate_v4                      cuCtxCreate_v4
I think I want to use cuCtxCreate v4 here, and the parameters are also correct, but if I modify it, it will fail to run.
Segmentation fault (core dumped).
I modified the initialization form of CUctxCreate Params, and with my attempts,
    // CUdevice device;
    // cuDeviceGet(&device, 0); 
    // cuCtxCreate(&g_cuContext, 0, 0);

    CUctxCreateParams params = {};
    params.execAffinityParams = nullptr;
    params.numExecAffinityParams = 0;
    params.cigParams = nullptr;

    CUdevice device;
    cuDeviceGet(&device, 0); 
    cuCtxCreate_v4(&g_cuContext, &params, 0, device); 

Both initialization methods have been tried, but the result is the same: range cannot be recognized

(sparse) wtyang@super:~/code/cupti_tutorial/GPU_Performance_Profiling_using_Range_Profiler_API$ ./a
Chip Name: AD102
Num of Passes: 1
cuptiRangeProfilerSetConfig result: 0
Counter data image pointer: 0x562b8abd5800
Counter data image size: 234164
cuptiRangeProfilerGetCounterDataInfo result: 0
Number of profiled ranges: 0

I compared the differences between autorange and the other two codes. Although both are in autorange mode, they use different APIs.

**autorange_profiling:**
cuptiProfilerCounterDataImageInitialize
cuptiProfilerCounterDataImageInitialize
cuptiProfilerCounterDataImageCalculateScratchBufferSize
cuptiProfilerCounterDataImageInitializeScratchBuffer
cuCtxGetCurrent
cuptiProfilerBeginSession
cuptiProfilerSetConfig
cuptiProfilerEnableProfiling
DoVectorAddSubtract
cuptiProfilerDisableProfiling
cuptiProfilerUnsetConfig
cuptiProfilerEndSession

**GPU Performance Profiling using Range Profiler API:**
cuptiRangeProfilerEnable
cuptiProfilerHostInitialize
cuptiProfilerHostConfigAddMetrics
cuptiProfilerHostGetConfigImageSize
cuptiProfilerHostGetConfigImage
cuptiProfilerHostGetNumOfPasses
cuptiProfilerHostDeinitialize
cuptiRangeProfilerGetCounterDataSize
cuptiRangeProfilerCounterDataImageInitialize
cuptiRangeProfilerSetConfig
cuptiRangeProfilerStart
 VectorAdd
cuptiRangeProfilerStop
cuptiRangeProfilerDecodeData
cuptiRangeProfilerGetCounterDataInfo
cuptiRangeProfilerDisable

The API of Profiler can be used normally, but the Profilerhost+rangeprofiling may not recognize the range
Anyway, thank you for your reply.
Best wishes!

(base) wtyang@super:/usr/local/cuda-12.6$ cat version.json
{
“cuda” : {
“name” : “CUDA SDK”,
“version” : “12.6.3”
},
“cuda_cccl” : {
“name” : “CUDA C++ Core Compute Libraries”,
“version” : “12.6.77”
},
“cuda_cudart” : {
“name” : “CUDA Runtime (cudart)”,
“version” : “12.6.77”
},
“cuda_cuobjdump” : {
“name” : “cuobjdump”,
“version” : “12.6.77”
},
“cuda_cupti” : {
“name” : “CUPTI”,
“version” : “12.6.80”
},
“cuda_cuxxfilt” : {
“name” : “CUDA cu++ filt”,
“version” : “12.6.77”
},
“cuda_demo_suite” : {
“name” : “CUDA Demo Suite”,
“version” : “12.6.77”
},
“cuda_gdb” : {
“name” : “CUDA GDB”,
“version” : “12.6.77”
},
“cuda_nsight” : {
“name” : “Nsight Eclipse Plugins”,
“version” : “12.6.77”
},
“cuda_nvcc” : {
“name” : “CUDA NVCC”,
“version” : “12.6.85”
},
“cuda_nvdisasm” : {
“name” : “CUDA nvdisasm”,
“version” : “12.6.77”
},
“cuda_nvml_dev” : {
“name” : “CUDA NVML Headers”,
“version” : “12.6.77”
},
“cuda_nvprof” : {
“name” : “CUDA nvprof”,
“version” : “12.6.80”
},
“cuda_nvprune” : {
“name” : “CUDA nvprune”,
“version” : “12.6.77”
},
“cuda_nvrtc” : {
“name” : “CUDA NVRTC”,
“version” : “12.6.85”
},
“cuda_nvtx” : {
“name” : “CUDA NVTX”,
“version” : “12.6.77”
},
“cuda_nvvp” : {
“name” : “CUDA NVVP”,
“version” : “12.6.80”
},
“cuda_opencl” : {
“name” : “CUDA OpenCL”,
“version” : “12.6.77”
},
“cuda_sanitizer_api” : {
“name” : “CUDA Compute Sanitizer API”,
“version” : “12.6.77”
},
“fabricmanager” : {
“name” : “Fabric Manager”,
“version” : “560.35.05”
},
“libcublas” : {
“name” : “CUDA cuBLAS”,
“version” : “12.6.4.1”
},
“libcufft” : {
“name” : “CUDA cuFFT”,
“version” : “11.3.0.4”
},
“libcufile” : {
“name” : “GPUDirect Storage (cufile)”,
“version” : “1.11.1.6”
},
“libcurand” : {
“name” : “CUDA cuRAND”,
“version” : “10.3.7.77”
},
“libcusolver” : {
“name” : “CUDA cuSOLVER”,
“version” : “11.7.1.2”
},
“libcusparse” : {
“name” : “CUDA cuSPARSE”,
“version” : “12.5.4.2”
},
“libnpp” : {
“name” : “CUDA NPP”,
“version” : “12.3.1.54”
},
“libnvfatbin” : {
“name” : “Fatbin interaction library”,
“version” : “12.6.77”
},
“libnvidia_nscq” : {
“name” : “NvSwitch Library”,
“version” : “560.35.05”
},
“libnvjitlink” : {
“name” : “JIT Linker Library”,
“version” : “12.6.85”
},
“libnvjpeg” : {
“name” : “CUDA nvJPEG”,
“version” : “12.3.3.54”
},
“libnvsdm” : {
“name” : “CUDA NVSDM”,
“version” : “560.35.05”
},
“nsight_compute” : {
“name” : “Nsight Compute”,
“version” : “2024.3.2.3”
},
“nsight_systems” : {
“name” : “Nsight Systems”,
“version” : “2024.5.1.113”
},
“nvidia_driver” : {
“name” : “NVIDIA Linux Driver”,
“version” : “560.35.05”
},
“nvidia_fs” : {
“name” : “NVIDIA file-system”,
“version” : “2.22.3”
}

Hi, @1978769439

Can you please run range_profiling directly without adding parameters ?

Yes, but the displayed results do not recognize ranges

$ ./range_profiling 
Starting Range Profiling
Compute Capability of Device: 8.9
Num of Passes: 1
Range Mode: auto
Replay Mode: user
Total num of Ranges: 0

Result verification passed.