BUG REPORT: nvidia-smi shows 0% GPU-Util when sampling elapsed_cycles_sm event

Description:

When sampling `elapsed_cycles_sm` by using continuous sample mode in CUPTI library, `nvidia-smi` command tool shows 0% GPU-Util. However it does work when sampling `inst_executed` event.

How to reproduce:

Modify the official example file `event_sampling/event_sampling.cu`, change the macro definition of 'EVENT_NAME' from 'inst_executed' to 'elapsed_cycles_sm', build and run this concurrently with `nvidia-smi`, the GPU-Util in `nvidia-smi` does't change at all.

Tested platform:

GPU: K40m, P100

Cuda driver:

NVRM version: NVIDIA UNIX x86_64 Kernel Module  396.26  Mon Apr 30 18:01:39 PDT 2018
GCC version:  gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)

nvidia-smi version:

NVIDIA-SMI 396.26                 Driver Version: 396.26

If you wish to file a bug report, do so by following the process here:

https://devtalk.nvidia.com/default/topic/1044668/cuda-programming-and-performance/-how-to-report-a-bug/