Something make me fell confuse when I use the nvprof

When I use the nvprof to detect the project’s time, I find that the API calls is 1000x times more than the GPU activities, and when I use the cudaEvent to record the time, the cudaEvent’s time shows 10x times more than the nvprof’s GPU activities.
So which time is the right? And I use the windows 10,NVIDIA GeForce GTX 1650 SUPER, and the drive edition is, CUDA 11.3, and test in the windows CMD.
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 94.97% 42.271us 1 42.271us 42.271us 42.271us my_kernel(void)
3.31% 1.4720us 1 1.4720us 1.4720us 1.4720us [CUDA memcpy HtoD]
1.73% 768ns 1 768ns 768ns 768ns [CUDA memcpy DtoH]
API calls: 83.18% 172.65ms 1 172.65ms 172.65ms 172.65ms cudaMemcpyToSymbol
16.49% 34.226ms 1 34.226ms 34.226ms 34.226ms cuDevicePrimaryCtxRelease
0.22% 448.30us 1 448.30us 448.30us 448.30us cudaDeviceSynchronize
0.05% 101.20us 1 101.20us 101.20us 101.20us cuModuleUnload
0.03% 51.900us 1 51.900us 51.900us 51.900us cudaLaunchKernel
0.02% 50.600us 1 50.600us 50.600us 50.600us cudaMemcpyFromSymbol
0.01% 17.900us 1 17.900us 17.900us 17.900us cuDeviceTotalMem
0.01% 12.500us 101 123ns 0ns 700ns cuDeviceGetAttribute
0.00% 2.8000us 3 933ns 200ns 2.3000us cuDeviceGetCount
0.00% 2.1000us 2 1.0500us 200ns 1.9000us cuDeviceGet
0.00% 700ns 1 700ns 700ns 700ns cuDeviceGetName
0.00% 300ns 1 300ns 300ns 300ns cuDeviceGetLuid
0.00% 200ns 1 200ns 200ns 200ns cuDeviceGetUuid (1003 Bytes)