No kernels were profiled warning/problem

ragerdl · July 2, 2020, 4:13pm

Greetings,

I’m trying to profile my application on a dgx box on the 3rd (counting from 0) V100 contained within. When running, I get the warning no kernels were profiled. Any ideas what’s going on? This is with Cuda 10.0 and Ubuntu with the 4.4.0 kernel. I’m fairly sure that the related .cu file was compiled with -G, but I’m under the impression that the kernel is profilable (at a high-level) either way. The command and response follows the signature.

Thanks!
David

myuser@dgx-test:~/r/my_dir$ /usr/local/NVIDIA-Nsight-Compute-2019.4/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli --devices 3 --export "/home/myuser/r/my_dir/nsight_compute_prof1" --force-overwrite --target-processes all --kernel-regex my_kernel_name_copy_pasted --kernel-regex-base function --launch-skip-before-match 0 --section ComputeWorkloadAnalysis --section InstructionStats --section LaunchStats --section MemoryWorkloadAnalysis --section MemoryWorkloadAnalysis_Chart --section MemoryWorkloadAnalysis_Tables --section Occupancy --section SchedulerStats --section SourceCounters --section SpeedOfLight --section WarpStateStats --sampling-interval auto --sampling-max-passes 5 --sampling-buffer-size 33554432 --nvtx --profile-from-start 1 --clock-control base --apply-rules "/home/myuser/r/my_dir/my_binary" arg1 arg2 arg3
<output indicating the process is running>
==PROF== Connected to process 3442
my_particular_test: PASS (latency XXX things/s)
1 test passed
==PROF== Disconnected from process 3442
==WARNING== No kernels were profiled

felix_dt · July 8, 2020, 6:16am

My assumption would be that the combination of filters you are using causes no kernels in your application to match. Either the kernel “my_kernel_name_copy_pasted” is not running on device 3, or the name simply doesn’t match.

My suggestion would be to start with a simpler command line, since most of the parameters you are passing match the defaults anyways and likely aren’t necessary in your case. Start with

myuser@dgx-test:~/r/my_dir$ /usr/local/NVIDIA-Nsight-Compute-2019.4/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli --devices 3 --export “/home/myuser/r/my_dir/nsight_compute_prof1” --force-overwrite --target-processes all --section SpeedOfLight --apply-rules “/home/myuser/r/my_dir/my_binary” arg1 arg2 arg3

and see if that matches your kernel. If that works, check the kernel name that is shown and try adding back the kernel name filter.

–kernel-regex my_kernel_name_copy_pasted

ragerdl · July 8, 2020, 9:32pm

Thanks for the idea @felix_dt!

As background, the way I generated that long command was by using the nsight gui. BTW, it’s pretty nice that the gui actually shows me the command it’s about to run, though I had to modify it manually to restrict execution to device 3.

Even without the kernel restriction, I still receive the warning of “no kernels were profiled”.

Since posting this, I’ve learned that sometimes nsight/nvvp/nvprof struggles to profile on any device other than the default device. I don’t really have access to device 0 on this machine, but does that sound like it could be part of the issue? My guess is that the nsight developers tend to develop for “device 0” and so are less likely to have completely tested the non-0 devices.

BTW, here’s my environment, and I’m able to confirm via nvidia-smi that the program is indeed running on Device 3.

myuser@dgx-test:~/$ env | grep "DEVICE"
CUDA_VISIBLE_DEVICES=3

felix_dt · July 9, 2020, 6:15am

I think the problem here might be the combination of CUDA_VISIBLE_DEVICES together with --devices.

Using the environment variable, you are instructing the CUDA driver that there should be only one device visible to CUDA applications (device 3 in your system), which will be made available to CUDA as the first device, i.e. device 0.

Using the --devices 3 options, you are instructing Nsight Compute to restrict profiling to the fourth device (the one with ID 3), but there aren’t four devices anymore at that point.

If you really only want your application to run on CUDA_VISIBLE_DEVICES=3, there is no need for the --devices option (or it can be set to 0). If you want your application to run on all devices, but only profile on device with ID 3, remove the CUDA_VISIBLE_DEVICES env variable and keep the --devices 3 option

ragerdl · July 9, 2020, 6:44am

Ah ha, this is good insight to have, as it would be difficult for me to track this down on my own. I’m giving it a whirl at the moment, but there’s an unrelated issue that’s keeping me from running the test. I’ll give it another try in the morning and report back. Thanks!

ragerdl · July 9, 2020, 7:33pm

@felix_dt was able to run the test just now. The test itself passes, but the profiler is having issues:

my_kernel: ==ERROR== Error: ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device 0. For instructions on enabling permissions and to get more information see https://developer.nvidia.com/ERR_NVGPUCTRPERM

This is what motivated me to use the --devices 3 option.

I don’t have sudo access on this machine. I’ve never needed sudo to profile in the past, so I’m surprised I need it. However, that link states that it’s a relatively new requirement from the 418.43 driver, and a lot of my experience is with a driver that’s slightly older than that. I guess I’ll see if it makes sense for me to gain sudo access…

felix_dt · July 10, 2020, 6:01am

Seeing this error message now is actually a good thing, as it implies that the profiler is now finding a kernel (on your physical device 3, CUDA device 0). As you found, it’s a new requirement from the driver, and you will need to work with your machine owner to get access again using one of the options listed on the page

run the profiler as root/sudo
temporarily load the kernel module with NVreg_RestrictProfilingToAdminUsers=0
permanently enable profiling for non-admin users with a file in /etc/modprobe.d

ehsanyusefzadehasl · December 14, 2021, 9:34am

Hi @felix_dt

I encounter “==WARNING== No Kernels were profiled” as I updated my CUDA version to 11.5 - Is it possible that I removed when I purged the earlier Driver to be able to update?

The CUDA program that I try to profile is just a simple Add but by using the following command I get that warning!

$ sudo nv-nsight-cu-cli --device 1 --target-process all --section SpeedOfLight a.out

I am testing on DGX A100 Station Machine.

Thanks

felix_dt · December 14, 2021, 12:09pm

Is it possible that I removed when I purged the earlier Driver to be able to update?

I don’t fully understand what you mean by this. Are you saying you updated your display driver and now you are concerned that updating the CUDA toolkit used to build the app doesn’t work with that newer driver? Which driver version did you update to and which Nsight Compute version are you using?

The warning indicates that Nsight Compute didn’t find any kernels to profile, given the specified flags. This may be because the app in fact does not launch any CUDA kernels, or because the filtering flags cause all of them to be ignored. You are passing --device 1. This tells Nsight Compute to only profile kernels on the 2nd device (as numbering starts at 0, similar to CUDA_VISIBLE_DEVICES). Hence, unless the app launches kernels on at least two devices, it won’t find anything. I suggest to remove this flag and check if that solves your problem. If not, note that --devices refers to the list of devices used by the app, numbered sequentially. This is on top to any potentially set CUDA_VISIBLE_DEVICES environment variable, which may be used to filter the devices that are visible to CUDA in the first place.

ehsanyusefzadehasl · December 14, 2021, 12:20pm

@felix_dt
Thanks for your response.
As I checked it again the ‘nvcc’ ignore the kernel part. It does not enter the kernel, just the c++ part is compiled and running!!

CUDA version 11.5
Driver Verison 495.29.05

CUDA_VISIABLE_DEVICE = 1

felix_dt · December 14, 2021, 12:30pm

Ok, sounds like you can hopefully fix the problem in your build command then.

CUDA_VISIABLE_DEVICE = 1

Just for completeness, this value would be “incompatible” with ncu --devices 1.
From the two or more devices in your machine
(0) (1)
CUDA_VISIBLE_DEVICES=1 would make only the second one visible to CUDA with index 0
( ) (0) → (0)
–devices 1 would tell ncu to profile only kernels on the 2nd GPU (the one with index 1), but there now is none
(0) → (0) (?)

ehsanyusefzadehasl · December 14, 2021, 12:35pm

As I checked the code running, it appears that you were right on the first mention that no kernel runs! Indeed the nvcc is not working! It just does not compile kernel! Because I just compiled it and tried to run it but no results. It is just like running the code without its GPU kernels?! Does the compiler have problems?

felix_dt · December 14, 2021, 12:40pm

If you have the code available, you can share it here for us to have a look. It’s not possible to debug this generically. Some things to try

Pay attention to compiler errors/warnings
Make sure compilations succeeds and re-builds your binary
Make sure you execute the right binary
Add a printf or similar in your code where you expect it to launch your kernel to see if that path is executed.

ehsanyusefzadehasl · December 14, 2021, 12:47pm

This is the code

#include <stdio.h>

__global__
void saxpy(int n, float a, float *x, float *y)
{
  printf("thread %d", threadIdx.x);
  int i = blockIdx.x*blockDim.x + threadIdx.x;
  if (i < n) y[i] = a*x[i] + y[i];
}

int main(void)
{
  int N = 1<<20;
  float *x, *y, *d_x, *d_y;
  x = (float*)malloc(N*sizeof(float));
  y = (float*)malloc(N*sizeof(float));

  cudaMalloc(&d_x, N*sizeof(float));
  cudaMalloc(&d_y, N*sizeof(float));

  for (int i = 0; i < N; i++) {
    x[i] = 1.0f;
    y[i] = 2.0f;
  }

  cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice);
  cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice);
  printf("Entering kernel!");
  // Perform SAXPY on 1M elements
  saxpy<<<(N+255)/256, 256>>>(N, 2.0f, d_x, d_y);

  printf("Out of Kernel");
  cudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost);

  float maxError = 0.0f;
  for (int i = 0; i < N; i++)
    maxError = max(maxError, abs(y[i]-4.0f));
  printf("Max error: %f\n", maxError);

  cudaFree(d_x);
  cudaFree(d_y);
  free(x);
  free(y);
}

and the result

Entering kernel!Out of KernelMax error: 2.000000

Compiled with “nvcc simpleKernel.cu -o out” command.

felix_dt · December 14, 2021, 12:59pm

I don’t think there is any error with your code, but the output suggests that there is an error with running this on your system. I recommend to check the return codes of all CUDA API calls to ensure they return cudaSuccess. You can use a macro like below for convenience. You can also run your binary through cuda-gdb, which is part of the CUDA toolkit.

#define RUNTIME_API_CALL(apiFuncCall)                                   \
  do {                                                                  \
    cudaError_t _status = apiFuncCall;                                  \
    if (_status != cudaSuccess) {                                       \
      fprintf(stderr, "%s:%d: error: function %s failed with error %s (%d).\n", \
              __FILE__, __LINE__, #apiFuncCall,                         \
              cudaGetErrorString(_status), _status);                    \
    }                                                                   \
  } while (0)

ehsanyusefzadehasl · December 14, 2021, 1:42pm

@felix_dt
The error that pops up while entering the kernel is “CUDA Error: initialization error”.

ehsanyusefzadehasl · December 14, 2021, 5:09pm

@felix_dt
Thanks a lot for your help.
The problem got solved as I reinstalled the driver and the cuda toolkit.

system · December 28, 2021, 5:10pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nsight-Compute returns “No kernels were profiled” warning Nsight Compute	9	1447	July 27, 2023
Ncu does not detect kernels, ==ERROR== The application returned an error code (11) Nsight Compute kernel , profiling	6	1879	December 13, 2023
NVIDIA NSight Compute: The profiler returned an error code:1 Nsight Compute	13	1936	March 18, 2024
Nsight compute fail to profile L20 gpu CUDA Programming and Performance	7	663	April 11, 2024
Nsight Compute not detecting kernel launch Nsight Compute profiling	13	3080	May 6, 2021
Can't Get NCU GUI To Import Properly Nsight Compute	8	1343	October 5, 2020
==ERROR== Failed to prepare kernel for profiling (0xc00000fd) but CUDA sample works Nsight Compute kernel , nvbugs	13	2053	November 6, 2021
CUDA_ERROR_INVALID_CONTEXT and driver resource unavailable when profiling Nsight Compute	4	184	November 11, 2024
`ncu` "No kernels profiled" Nsight Compute	6	2279	September 29, 2022
Nvidia Cuda Compiler not showing up in Linux 22.04 Linux cuda , linux , nvcc	24	19315	May 30, 2022

No kernels were profiled warning/problem

Related topics