CUDA Profile Tool Warnings

BHC · May 21, 2010, 7:45pm

I am attempting to use the CUDA profiler tool to optimize my application. I have copied the output below. It appears that my value of warp_serialize is high, but I’m not quite sure what is causing it. I have two specific questions regarding this output, but of course any other advice is welcome.

Why am I getting all these warning messages? I would like to profile my local_load, divergent_branch, etc.
Besides divergent branches and shared memory bank conflicts, is there anything else that can cause the value of warp_serialize to be high?

-------------- output -----------------

NV_Warning: Signal gst_coherent can not be profiled in this run.
NV_Warning: Signal gst_incoherent can not be profiled in this run.
NV_Warning: Signal gld_32b can not be profiled in this run.
NV_Warning: Signal gld_64b can not be profiled in this run.
NV_Warning: Signal gld_128b can not be profiled in this run.
NV_Warning: Signal gld_request can not be profiled in this run.
NV_Warning: Signal local_load can not be profiled in this run.
NV_Warning: Signal local_store can not be profiled in this run.
NV_Warning: Signal branch can not be profiled in this run.
NV_Warning: Signal divergent_branch can not be profiled in this run.
NV_Warning: Signal instructions can not be profiled in this run.
NV_Warning: Signal warp_serialize can not be profiled in this run.
NV_Warning: Signal cta_launched can not be profiled in this run.

CUDA_PROFILE_LOG_VERSION 1.6

CUDA_DEVICE 1 Tesla T10 Processor

TIMESTAMPFACTOR fffff72c43210a40

timestamp,method,gputime,cputime,regperthread,occupancy,cta_
launched,warp_serialize,gld_coherent,gld_incoherent
timestamp=[ 2966.000 ] method=[ memcpyHtoD ] gputime=[ 6.176 ] cputime=[ 5.000 ]
timestamp=[ 1255162.000 ] method=[ memcpyHtoD ] gputime=[ 257933.250 ] cputime=[ 258374.031 ]
timestamp=[ 1513556.000 ] method=[ memcpyHtoD ] gputime=[ 266559.781 ] cputime=[ 266961.000 ]
timestamp=[ 1783791.000 ] method=[ memcpyHtoD ] gputime=[ 32.832 ] cputime=[ 70.000 ]
timestamp=[ 1783866.000 ] method=[ memcpyHtoD ] gputime=[ 32.800 ] cputime=[ 63.000 ]
timestamp=[ 1783929.875 ] method=[ memcpyHtoD ] gputime=[ 32.864 ] cputime=[ 63.000 ]
timestamp=[ 1783994.000 ] method=[ memcpyHtoD ] gputime=[ 32.864 ] cputime=[ 62.000 ]
timestamp=[ 1784057.000 ] method=[ memcpyHtoD ] gputime=[ 33.024 ] cputime=[ 62.000 ]
timestamp=[ 1786586.000 ] method=[ memcpyHtoD ] gputime=[ 182.144 ] cputime=[ 322.000 ]
timestamp=[ 1786912.000 ] method=[ memcpyHtoD ] gputime=[ 179.264 ] cputime=[ 322.000 ]
timestamp=[ 1787274.000 ] method=[ memcpyHtoD ] gputime=[ 4.064 ] cputime=[ 3.000 ]
timestamp=[ 1787336.000 ] method=[ Z10computeSARPfS_S_S_S_S_S_S_S_S ] gputime=[ 4039637.000 ] cputime=[ 4039645.000 ] regperthread=[ 59 ] occupancy=[ 0.250 ] cta_launched=[ 26 ] warp_serialize=[ 367774596 ] gld_coherent=[ 77185382 ] gld_incoherent=[ 0 ]
timestamp=[ 5827141.000 ] method=[ memcpyDtoH ] gputime=[ 192.384 ] cputime=[ 744.000 ]
timestamp=[ 5827888.000 ] method=[ memcpyDtoH ] gputime=[ 181.120 ] cputime=[ 703.000 ]
timestamp=[ 5828593.000 ] method=[ memcpyDtoH ] gputime=[ 4.992 ] cputime=[ 16.000 ]

tera · May 21, 2010, 8:13pm

I don’t know the latest versions of the profiler, but I believe there are limits to the number of things you can profile in one run. So you just need multiple runs, sampling a few variables each time.
Atomic ops can cause warp serialization, as can built-in transcendental functions (through conditionals in the library implementation).

BHC · May 30, 2010, 4:31pm

Thank you for your reply. You were correct about the profiler. Multiple runs permitted me to get results for all the tests. I have posted a new question regarding shared memory bank conflicts. If you get a chance…

[url=“http://forums.nvidia.com/index.php?showtopic=170031”]http://forums.nvidia.com/index.php?showtopic=170031[/url]

kumazaku · February 16, 2011, 12:48am

I have same problem. Please give me a solution

Topic		Replies	Views
Having problems with warp divergence/serialization profiler: high warp serialize rate although diver CUDA Programming and Performance	4	1663	October 27, 2009
cuda profiler reports high warp serialize CUDA Programming and Performance	5	2057	May 14, 2010
Time To Profile CUDA Programming and Performance	11	5616	October 20, 2011
Warning: Unified Memory Profiling is not supported on this configuration CUDA Programming and Performance	6	5085	May 28, 2015
Always got this warning when nvprof cuda file "This can happen if device ran out of memory or if a device kernel was stopped due to an assertion" on just HellowWorld GPU CUDA Programming and Performance	9	2557	January 31, 2019
How to optimize my cuda code? CUDA Programming and Performance	14	1931	June 28, 2023
CUDA accelerated Linpack seemingly not using any GPU CUDA Programming and Performance	18	3660	March 26, 2018
NVPROF is causing system instability and requiring reboot CUDA Programming and Performance	8	915	February 19, 2018
Cuda profiler options CUDA Programming and Performance	6	1760	June 9, 2009
NVProf error on samples CUDA Programming and Performance	28	20452	December 29, 2020

CUDA Profile Tool Warnings

CUDA_PROFILE_LOG_VERSION 1.6

CUDA_DEVICE 1 Tesla T10 Processor

TIMESTAMPFACTOR fffff72c43210a40

Related topics