I am trying to profile my application (store/load only application) using the cuda event API. The execution time given by the cuda event was similar to the time given by nsight systems. However, when I disabled the JIT compilation and compiled again using --gpu-architecture=compute_72 --gpu-code=sm_72 (since I am on the Jetson AGX Xavier), I am getting different results with cuda event (for example I am getting 6 microsecond with the cuda event and getting 2ms with nsight systems). The values with nsight systems didn’t change before and after the deactivation of the JIT.

Update: I discovered something that when I run the application with sudo, it works. I figured out that the kernel executes when I run the app with the command sudo. It seems the driver cannot access the binary file without the sudo command.(very weird)

Is your user a member of group “video”? Debug and profiling might still require sudo, but in general, for normal operation, you might gain access to GPU without sudo when a member of the proper group. What do you see from:
grep video /etc/group

You should see your user name there. If not, assuming your user name is ubuntu:
sudo usermod -aG video ubuntu

Then try again without sudo.

