CUPTI activity_trace_async example doesn't work with CUDA/CUPTI 10.1.243 (Update 2)

Building the samples/activity_trace_async example only gives this output:

CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE = 8388608
CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT = 100
Device Name: Tesla V100-SXM2-32GB
Device Name: Tesla V100-SXM2-32GB
Device Name: Tesla V100-SXM2-32GB
Device Name: Tesla V100-SXM2-32GB

…but the bufferCompleted() function never gets called.

This same example (with minor downgrades) works fine with version 9.2.

System: IBM Power 9 node, ppc64le, using gcc 7.3.

Hi, can you run other samples successfully?
I have a very strange issue here:
https://devtalk.nvidia.com/default/topic/1065105/cuda-profiler-tools-interface-cupti-/why-cuptieventgroupenable-always-reports-quot-cupti_error_invalid_parameter-quot-error-/

Interesting you ask that question. I get this output from activity_trace and callback_event:

[khuck@cyclops activity_trace]$ ./activity_trace
Device Name: Tesla V100-SXM2-32GB
Device Name: Tesla V100-SXM2-32GB
Device Name: Tesla V100-SXM2-32GB
Device Name: Tesla V100-SXM2-32GB
activity_trace.cpp:355: error: function cuptiActivityFlushAll(0) failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES.

[khuck@cyclops callback_event]$ ./callback_event 0
Usage: ./callback_event [device_num] [event_name]
CUDA Device Number: 0
CUDA Device Name: Tesla V100-SXM2-32GB
callback_event.cu:239:Error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES for CUPTI API function ‘cuptiSubscribe’.

Which I know about the fix for that, but waiting for an administrator (who knows what they are doing) to set the modprobe flag. But we don’t get that error when subscribing to the asynchronous activity API.