Nsys Does not Show the kernels output

My system is V100 with the following information:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.6 |
|-------------------------------±---------------------±---------------------+
NVIDIA Nsight Systems version 2021.5.2.53-28d0e6e

sudo sh -c “echo 2 >/proc/sys/kernel/perf_event_paranoid”
/bin/bash: /proc/sys/kernel/perf_event_paranoid: Read-only file system

Note that it is 3.

Output:
Generated:
/home/build/Baseline.nsys-rep

That’s my command prefix:

nsys profile --capture-range=cudaProfilerApi --trace-fork-before-exec true --force-overwrite true -s cpu --cudabacktrace=true -t cuda,nvtx,osrt,cudnn,cublas -o Baseline

nsys status -e
Timestamp counter supported: No
Sampling Environment Check
Linux Kernel Paranoid Level = -1: OK
Linux Distribution = Ubuntu
Linux Kernel Version = 5.0.0-1032-azure: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Not Available
Sampling Environment: OK

That’s the output:

No kernel profiles show at all – Any clue, please?

@rknight can you take a look at this?

Hi user27549,

Can you share your NSYS-REP file (i.e. the collection results) with us so we can take a closer look at what is happening on your system?

test.nsys-rep (39.0 MB)
Here you go - Please keep me posted

That’s a snapshot of the diagnostics:

Hi hossam.amer,

The trace-fork-before-exec switch is not entirely safe and it does not guarantee traces. If you are using the Python multiprocessing module, could switch to using spawn for the start method and avoid using the trace-fork-before-exec switch?

from multiprocessing import set_start_method
# ...
if __name__ == '__main__':
    set_start_method('spawn')
    # ...

Also, several CUDA tracing bugs have been fixed in nsys since the 2021.5 release. Would it be possible for you to upgrade to the 2022.4 nsys release and try your collection again?

One more suggestion would be to simplify your nsys command line to help us narrow down the issue with CUDA tracing. Could you run this command to see if you achieve a clean CUDA trace?
nsys profile --capture-range=cudaProfilerApi --force-overwrite true -s none -t cuda -o moeBaseline python ...

Hi rknight,

Thanks for getting back to me.

Please note that I used the NVIDIA Nsight Systems 2021.5.1 for viewing the shared profile, but I still cannot see the kernel profile data.

I ran the suggested simplified command while adding the start and stop profile methods in the python script. That’s the output of nsys stats:
profileOut.txt (6.7 KB)

here’s the torch command I used to start:

d0 = torch.device(“cuda”)
with torch.cuda.device(d0):
torch.cuda.profiler.cudart().cudaProfilerStart()

That’s the profile output:
test2.nsys-rep (14.0 MB)

Unfortunately, no specific kernel(s) data can be shown.

1 Like

I also tried the same nsys command with the vectorAdd Cuda 11.6 toolkit sample. I face the same problem.

Please note the following:

cat /proc/sys/kernel/perf_event_paranoid
3

Is this the issue? If yes, that file is readonly on my system, so I cannot edit it. But it is -1 when I run the nsys status -e

I am facing something similar to this forum link with vectorAdd

I also updated the nsys version to the 2022 version using the following commands:

apt-get update && apt-get install -y nsight-systems-2022.4.2

Ran the profiling of vector add

/opt/nvidia/nsight-systems/2022.4.2/target-linux-x64/nsys profile --force-overwrite true -s none -t cuda vectorAdd

Again the same problem as the 2021 version with no specific kernel profile data shown. Vector add profile:
report2.nsys-rep (766.1 KB)

Hi hossam.amer,

The perf_event_paranoid level does not matter in this case. It only affects CPU profiling operations.

Can you run the nvidia-smi command and post the results?

Can you also collect and injection log? To collect an injection log, run the following command;
/opt/nvidia/nsight-systems/2022.4.2/target-linux-x64/nsys profile --force-overwrite true -s none -t cuda -e NVLOG_CONFIG_FILE=/opt/nvidia/nsight-systems/2022.4.2/host-linux-x64/nvlog.config.template vectorAdd

The injection log will be named nsys-ui.log and will be found in your working directory. Please share the injection log with us.

Hi rknight,

Please find the injection log file.
nsys-ui.log (16.6 KB)

Output of nvidia-smi:

Can you do another experiment that can help us narrow down the issue?

Please follow the README.md instructions to build and use a cuda injection library in the attached cuda-injection-library-linux.tar.gz file. When you run your application following the Use instructions, please capture an INJECTION_LOG_FILE file and upload it to this Forum discussion.
cuda-injection-library-linux.tar.gz (5.9 KB)

I found CUPTI in /usr/local/cuda/lib64, added to LD_LIBRARY_PATH, and compiled well.

That’s how I run vector add now:
CUDA_INJECTION64_PATH=/home/hossamamer/young/cuda-injection-library-linux INJECTION_LOG_FILE=/home/hossamamer/young/cuda-samples/Samples/0_Introduction/vectorAdd/log.txt ./test

Does not seem to output the intended file - Is this correct?

Not sure if this is relevant, now when I type make for vector add, this is what I get:

/usr/local/cuda/bin/nvcc -ccbin g++ -I…/…/…/Common -m64 --threads 0 --std=c++11 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90,code=compute_90 -o vectorAdd.o -c vectorAdd.cu
nvcc fatal : Unsupported gpu architecture ‘compute_90’
make: *** [Makefile:324: vectorAdd.o] Error 1

I compiled it like this instead:
nvcc -I…/…/…/Common -o test vectorAdd.cu

The CUDA_INJECTION64_PATH environment variable should be set as follows;
CUDA_INJECTION64_PATH=/home/hossamamer/young/libToolsInjectionCuda.so
assuming you didn’t change the name of the resulting library created with the make command.

The vectorAdd application doe not need to be recompiled to do this test.

Sure, after setting the env variable.

Which command to type?

The command would be
CUDA_INJECTION64_PATH=/home/hossamamer/young/libToolsInjectionCuda.so INJECTION_LOG_FILE=/home/hossamamer/young/cuda-samples/Samples/0_Introduction/vectorAdd/log.txt ./test

That was the command used:
CUDA_INJECTION64_PATH=/home/hossamamer/young/cuda-injection-library-linux/libToolsInjectionCuda.so INJECTION_LOG_FILE=/home/hossamamer/young/cuda-samples/Samples/0_Introduction/vectorAdd/log.txt ./test

That was the output:

00:24:23.586.954|19432|Lib.cpp:566[InitializeInjection]: Initializing CUDA tracing
00:24:23.601.866|19432|Lib.cpp:348[EnableCollection]: Starting collection
00:24:23.601.932|19432|Lib.cpp:580[InitializeInjection]: CUDA tracing initialized
00:24:23.803.869|19432|Lib.cpp:221[BufferRequested]: Buffer requested
00:24:23.809.004|19432|Lib.cpp:339[EnableUvmActivity]: Initialized UVM
00:24:24.214.633|19432|Lib.cpp:514[AtExitHandler]: Flushing CUPTI buffers on exit
00:24:24.217.414|19432|Lib.cpp:230[BufferCompleted]: Buffer completed
00:24:24.217.474|19432|Lib.cpp:116[ProcessActivityRecord]: Device record received
00:24:24.217.485|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetCount’
00:24:24.217.493|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGet’
00:24:24.217.500|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetName’
00:24:24.217.508|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceTotalMem_v2’
00:24:24.217.516|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.532|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.540|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.546|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.554|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.561|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.568|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.574|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.581|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.588|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.595|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.602|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.609|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.616|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.623|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.630|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.645|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.653|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.660|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.666|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.673|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.681|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.688|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.694|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.701|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.708|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.715|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.721|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.748|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.758|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.764|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.771|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.778|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.785|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.792|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.799|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.806|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.813|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.820|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.827|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.833|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.841|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.848|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.855|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.862|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.869|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.875|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.883|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.889|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.896|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.903|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.910|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.917|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.924|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.931|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.938|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.945|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.951|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.958|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.965|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.972|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.979|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.986|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.992|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.217.999|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.006|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.016|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.023|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.030|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.036|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.043|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.050|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.057|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.064|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.071|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.078|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.085|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.092|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.099|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.106|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.113|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.120|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.127|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.134|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.141|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.148|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.155|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.162|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.177|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.184|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.191|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.197|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.204|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.211|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.218|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.225|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetUuid’
00:24:24.218.232|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.239|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.246|19432|Lib.cpp:160[ProcessActivityRecord]: Driver API record received: ‘cuDeviceGetAttribute’
00:24:24.218.253|19432|Lib.cpp:124[ProcessActivityRecord]: Device context record received
00:24:24.218.260|19432|Lib.cpp:169[ProcessActivityRecord]: Runtime API record received: ‘cudaMalloc_v3020’
00:24:24.218.266|19432|Lib.cpp:169[ProcessActivityRecord]: Runtime API record received: ‘cudaMalloc_v3020’
00:24:24.218.273|19432|Lib.cpp:169[ProcessActivityRecord]: Runtime API record received: ‘cudaMalloc_v3020’
00:24:24.218.283|19432|Lib.cpp:136[ProcessActivityRecord]: Memory copy record received
00:24:24.218.291|19432|Lib.cpp:169[ProcessActivityRecord]: Runtime API record received: ‘cudaMemcpy_v3020’
00:24:24.218.298|19432|Lib.cpp:136[ProcessActivityRecord]: Memory copy record received
00:24:24.218.305|19432|Lib.cpp:169[ProcessActivityRecord]: Runtime API record received: ‘cudaMemcpy_v3020’
00:24:24.218.371|19432|Lib.cpp:185[ProcessActivityRecord]: Kernel launch record received: ‘vectorAdd(float const*, float const*, float*, int)’
00:24:24.218.379|19432|Lib.cpp:169[ProcessActivityRecord]: Runtime API record received: ‘cudaLaunchKernel_v7000’
00:24:24.218.386|19432|Lib.cpp:169[ProcessActivityRecord]: Runtime API record received: ‘cudaGetLastError_v3020’
00:24:24.218.394|19432|Lib.cpp:136[ProcessActivityRecord]: Memory copy record received
00:24:24.218.401|19432|Lib.cpp:169[ProcessActivityRecord]: Runtime API record received: ‘cudaMemcpy_v3020’
00:24:24.218.408|19432|Lib.cpp:214[ProcessActivityRecord]: Processing CUPTI record kind 45
00:24:24.218.415|19432|Lib.cpp:169[ProcessActivityRecord]: Runtime API record received: ‘cudaFree_v3020’
00:24:24.218.422|19432|Lib.cpp:214[ProcessActivityRecord]: Processing CUPTI record kind 45
00:24:24.218.429|19432|Lib.cpp:169[ProcessActivityRecord]: Runtime API record received: ‘cudaFree_v3020’
00:24:24.218.436|19432|Lib.cpp:214[ProcessActivityRecord]: Processing CUPTI record kind 45
00:24:24.218.443|19432|Lib.cpp:169[ProcessActivityRecord]: Runtime API record received: ‘cudaFree_v3020’
00:24:24.218.450|19432|Lib.cpp:244[BufferCompleted]: All records were processed

@mjain Can you take a look at this issue?

I want to update the thread with the following working trial for vector add profiling after I tried it on a different computer.

That’s the output of Nvidia smi:

That’s the version of nsys:
NVIDIA Nsight Systems version 2022.3.1.43-b82618b

Now, produces the vector add cuda profiles!!

The following combination does not work:

Thanks a bunch for the update hossam.amer. Appreciated!