OpenACC and C++: undefined reference to `cudaProfilerStart'

Hello,

I tried to use cudaProfilerStart() and cudaProfilerStop() for some focused profiling of a C++ program with OpenACC acceleration. I have the line #include <cuda_profiler_api.h> at the top of my program but during the linking stage, there are errors like "undefined reference to `cudaProfilerStart’ ".

Clearly something is not set up correctly. Do I need to specify a particular library to link with? The cuda_profiler_api.h file can be found at my ~/pgi/linux86-64/2018/cuda/10.0/include .

Thanks,
Shine

Hi Shine,

Try adding the flag “-Mcuda” to the link so the CUDA libraries are added.

Hope this helps,
Mat

Thanks a lot, Mat! Your solution works like a charm.

Thanks,
Shine

A follow-up question:

One of the motivations of opting for focused profiling was to get rid of errors for nvvp. However the same error remains with nvvp: as I import the profiling output created by nvprof, a message reads: “The start and end time stamps on 988 kernels, memcpys, and other collected profile data are invalid. Those profiling records have been dropped and will not be displayed in the timeline.” Then I do see empty regions in nvvp’s timeline, which hinders optimization efforts.

A batch script is used to run the code where it firstly sets up the environment (this is why it is less convenient to create sessions directly inside nvvp), and launches nvprof in the end. The launch line reads “mpirun ${ompi_options} -np ${NBPROC} nvprof --profile-from-start off -o GPU_prof.%q{OMPI_COMM_WORLD_RANK}.nvprof ${SOLVER} ${RUN_DIR}/*.run”

The generated *.nvprof file is less than 3 MB, so the error is less likely due to loading large profile. Some online searches seem to suggest the use of “cudaDeviceReset()”, but it does not solve the problem for me. Could you comment on the error?

Thanks,
Shine

Just a dummy reply to move this thread up to the list so that someone could comment on my follow-up question above.

Thanks,
Shine

Hi Shine,

What CUDA Driver do yo have installed?

It seems that a recent update to the CUDA drivers has disabled user level profiling due to a potential security risk.

See: https://nvidia.custhelp.com/app/answers/detail/a_id/4738
With the authorized workaround at: https://developer.nvidia.com/nvidia-development-tools-solutions-ERR_NVGPUCTRPERM-permission-issue-performance-counters

I started seeing similar errors on my system after updating the CUDA drivers and needed my IT folks to apply the work-around. Profiling worked again after that.

-Mat

Hi Mat,

Sorry for my slow reply. The version of my CUDA driver is

[shine@dummy]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Since I do not see the “ERR_NVGPUCTRPERM” error, I wonder whether my case would be due to “If the kernel launch rate is very high, the device memory used to collect profiling data can run out. In such a case some profiling data might be dropped. This will be indicated by a warning.” (see https://docs.nvidia.com/cuda/profiler-users-guide/index.html#profiler-known-issues) My application does have quite frequent kernel launches. Do you think the warning message that I received is the one?

My warning message is:

The start and end time stamps on 988 kernels, memcpys, and other collected profile data are invalid. Those profiling records have been dropped and will not be displayed in the timeline.

Thanks,
Shine