What is "others" category in "Kernel Profiling: PC sampling"?

I am new to nvprof. I am using Kernel Profiling to see where my kernel is spending most of the time. I can see that about 33% of time is spent in “other”. Can someone help me understand what this category means (e.g. give me some hints about what can it be?)

To give background, I have modified a sample matrix multiplication to use persistent thread model. I have basically introduced some atomicAdd() and few writes to host pinned memory. Due to these changes, I see 30% overhead (as compared to baseline simple matrix multiplication). The “others” category is baseline is 17% whereas in the persistent thread model it is 33%. Can someone tell me how to figure this overhead out?

Question: Can atomicAdd() take upto 5us under heavy contention? (Max 30 threads will be calling atomicAdd() at same time across whole GPU)

See https://stackoverflow.com/questions/14887807/what-are-other-issue-stall-reasons-displayed-by-the-nsight-profiler