About https://docs.nvidia.com/cupti/tutorial/tutorial.html#gpu-performance-profiling-using-range-profiler-api

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0

My CUDA version is 12.6, so I made modifications to the context creation function in the tutorial
CUctxCreateParams params = {};
params.execAffinityParams = nullptr;
params.numExecAffinityParams = 0;
params.cigParams = nullptr;

CUdevice device;
cuDeviceGet(&device, 0); //Obtain the 0th CUDA device (adjust the device number according to the actual situation)
cuCtxCreate_v4(&g_cuContext, &params, 0, device); //Change the last parameter to the obtained device

But the final output
Chip Name: AD102
Num of Passes: 1
Number of profiled ranges: 0

There is no expected indicator output in the tutorial. May I ask where the problem lies? Can this tutorial only be used with CUDA13.0 or is CUPTIAutoRange not supported in 12.6

Hi, @1978769439
Sorry for the late response. Can you clarify what exactly issue do you have ?
Also can you provide a mini-repro for us to reproduce ? Thanks !

It’s okay, I’ll explain in detail what I know now
I want to learn Cupti’s range profiling API, and my environment is CUDA12.6.3, CUPTI12.6.8, and GPU is 4090,
I suggest using the CUPTI Range Profiling API and the host API CUPTI Profiler Host API for versions after 12.6 as stated in the documentation. I have tried three different codes, autorange_profiling and range_profiling in/usr/local/CUDA-12.6/extras/CUPTI/samples, and GPU Performance Profiling using Range Profiler API in Tutorial in the official documentation
At present, autorange_profiling processing can run normally, and this code uses CUPTI-AutoRange. However, I have not been able to run the code for range_profiling and GPU Performance Profiting using Range Profiler API. Both of these are also using CUPTI-AutoRange mode, but the output obtained does not recognize range

range_profiling:
(sparse) wtyang@super:~/code/project/test/range_profiling$ ./range_profiling -d 0 -r auto -e kernel
Starting Range Profiling
Compute Capability of Device: 8.9
Num of Passes: 1
Range Mode: auto
Replay Mode: kernel
Total num of Ranges: 0
GPU Performance Profiling using Range Profiler API:
I added some intermediate information to check if it didn't run successfully, but I couldn't find it and still couldn't recognize the ranges. Here, I modified this sentence in the tutorial, but I can't recognize them here cuCtxCreate(&g_cuContext, (CUctxCreateParams*)0, 0, 0);I checked cuda.h and found that
#define cuCtxCreate                         cuCtxCreate_v2
#define cuCtxCreate_v3                      cuCtxCreate_v3
#define cuCtxCreate_v4                      cuCtxCreate_v4
I think I want to use cuCtxCreate v4 here, and the parameters are also correct, but if I modify it, it will fail to run.
Segmentation fault (core dumped).
I modified the initialization form of CUctxCreate Params, and with my attempts,
    // CUdevice device;
    // cuDeviceGet(&device, 0); 
    // cuCtxCreate(&g_cuContext, 0, 0);

    CUctxCreateParams params = {};
    params.execAffinityParams = nullptr;
    params.numExecAffinityParams = 0;
    params.cigParams = nullptr;

    CUdevice device;
    cuDeviceGet(&device, 0); 
    cuCtxCreate_v4(&g_cuContext, &params, 0, device); 

Both initialization methods have been tried, but the result is the same: range cannot be recognized

(sparse) wtyang@super:~/code/cupti_tutorial/GPU_Performance_Profiling_using_Range_Profiler_API$ ./a
Chip Name: AD102
Num of Passes: 1
cuptiRangeProfilerSetConfig result: 0
Counter data image pointer: 0x562b8abd5800
Counter data image size: 234164
cuptiRangeProfilerGetCounterDataInfo result: 0
Number of profiled ranges: 0

I compared the differences between autorange and the other two codes. Although both are in autorange mode, they use different APIs.

**autorange_profiling:**
cuptiProfilerCounterDataImageInitialize
cuptiProfilerCounterDataImageInitialize
cuptiProfilerCounterDataImageCalculateScratchBufferSize
cuptiProfilerCounterDataImageInitializeScratchBuffer
cuCtxGetCurrent
cuptiProfilerBeginSession
cuptiProfilerSetConfig
cuptiProfilerEnableProfiling
DoVectorAddSubtract
cuptiProfilerDisableProfiling
cuptiProfilerUnsetConfig
cuptiProfilerEndSession

**GPU Performance Profiling using Range Profiler API:**
cuptiRangeProfilerEnable
cuptiProfilerHostInitialize
cuptiProfilerHostConfigAddMetrics
cuptiProfilerHostGetConfigImageSize
cuptiProfilerHostGetConfigImage
cuptiProfilerHostGetNumOfPasses
cuptiProfilerHostDeinitialize
cuptiRangeProfilerGetCounterDataSize
cuptiRangeProfilerCounterDataImageInitialize
cuptiRangeProfilerSetConfig
cuptiRangeProfilerStart
 VectorAdd
cuptiRangeProfilerStop
cuptiRangeProfilerDecodeData
cuptiRangeProfilerGetCounterDataInfo
cuptiRangeProfilerDisable

The API of Profiler can be used normally, but the Profilerhost+rangeprofiling may not recognize the range
Anyway, thank you for your reply.
Best wishes!

(base) wtyang@super:/usr/local/cuda-12.6$ cat version.json
{
“cuda” : {
“name” : “CUDA SDK”,
“version” : “12.6.3”
},
“cuda_cccl” : {
“name” : “CUDA C++ Core Compute Libraries”,
“version” : “12.6.77”
},
“cuda_cudart” : {
“name” : “CUDA Runtime (cudart)”,
“version” : “12.6.77”
},
“cuda_cuobjdump” : {
“name” : “cuobjdump”,
“version” : “12.6.77”
},
“cuda_cupti” : {
“name” : “CUPTI”,
“version” : “12.6.80”
},
“cuda_cuxxfilt” : {
“name” : “CUDA cu++ filt”,
“version” : “12.6.77”
},
“cuda_demo_suite” : {
“name” : “CUDA Demo Suite”,
“version” : “12.6.77”
},
“cuda_gdb” : {
“name” : “CUDA GDB”,
“version” : “12.6.77”
},
“cuda_nsight” : {
“name” : “Nsight Eclipse Plugins”,
“version” : “12.6.77”
},
“cuda_nvcc” : {
“name” : “CUDA NVCC”,
“version” : “12.6.85”
},
“cuda_nvdisasm” : {
“name” : “CUDA nvdisasm”,
“version” : “12.6.77”
},
“cuda_nvml_dev” : {
“name” : “CUDA NVML Headers”,
“version” : “12.6.77”
},
“cuda_nvprof” : {
“name” : “CUDA nvprof”,
“version” : “12.6.80”
},
“cuda_nvprune” : {
“name” : “CUDA nvprune”,
“version” : “12.6.77”
},
“cuda_nvrtc” : {
“name” : “CUDA NVRTC”,
“version” : “12.6.85”
},
“cuda_nvtx” : {
“name” : “CUDA NVTX”,
“version” : “12.6.77”
},
“cuda_nvvp” : {
“name” : “CUDA NVVP”,
“version” : “12.6.80”
},
“cuda_opencl” : {
“name” : “CUDA OpenCL”,
“version” : “12.6.77”
},
“cuda_sanitizer_api” : {
“name” : “CUDA Compute Sanitizer API”,
“version” : “12.6.77”
},
“fabricmanager” : {
“name” : “Fabric Manager”,
“version” : “560.35.05”
},
“libcublas” : {
“name” : “CUDA cuBLAS”,
“version” : “12.6.4.1”
},
“libcufft” : {
“name” : “CUDA cuFFT”,
“version” : “11.3.0.4”
},
“libcufile” : {
“name” : “GPUDirect Storage (cufile)”,
“version” : “1.11.1.6”
},
“libcurand” : {
“name” : “CUDA cuRAND”,
“version” : “10.3.7.77”
},
“libcusolver” : {
“name” : “CUDA cuSOLVER”,
“version” : “11.7.1.2”
},
“libcusparse” : {
“name” : “CUDA cuSPARSE”,
“version” : “12.5.4.2”
},
“libnpp” : {
“name” : “CUDA NPP”,
“version” : “12.3.1.54”
},
“libnvfatbin” : {
“name” : “Fatbin interaction library”,
“version” : “12.6.77”
},
“libnvidia_nscq” : {
“name” : “NvSwitch Library”,
“version” : “560.35.05”
},
“libnvjitlink” : {
“name” : “JIT Linker Library”,
“version” : “12.6.85”
},
“libnvjpeg” : {
“name” : “CUDA nvJPEG”,
“version” : “12.3.3.54”
},
“libnvsdm” : {
“name” : “CUDA NVSDM”,
“version” : “560.35.05”
},
“nsight_compute” : {
“name” : “Nsight Compute”,
“version” : “2024.3.2.3”
},
“nsight_systems” : {
“name” : “Nsight Systems”,
“version” : “2024.5.1.113”
},
“nvidia_driver” : {
“name” : “NVIDIA Linux Driver”,
“version” : “560.35.05”
},
“nvidia_fs” : {
“name” : “NVIDIA file-system”,
“version” : “2.22.3”
}

Hi, @1978769439

Can you please run range_profiling directly without adding parameters ?

Yes, but the displayed results do not recognize ranges

$ ./range_profiling 
Starting Range Profiling
Compute Capability of Device: 8.9
Num of Passes: 1
Range Mode: auto
Replay Mode: user
Total num of Ranges: 0

Result verification passed.

Hi,
Can you try running it in sudo (root mode)?

I’m not sure if you have done the error check for all the CUPTI APIs? If not can you try that, it will help us on debugging the issue.